IPRES – STATISTICS
CHAPTER 1
Introduction – Are statistics relevant to real life?
Did you know you already use statistics?
Learning about statistics helps you to look for reliable patterns and associations in both the short and
long term
• It also teaches us caution in expecting these to hold true in all situations
There are often important limitations to data which need to be considered, including whether they are
biased, unrepresentative or totally meaningless.
Statistics teaches us to think critically about our techniques, samples and claims we make.
It is basically about understanding and knowing how to use data.
How are statistics used?
Data provide information which governments and organizations use to make policy decisions (and to
evaluate the effectiveness of existing policies).
This century has been called the century of ‘big data’.
‘Big data’ refers to every piece of knowledge that has or will be digitalized and stored on a computer
hard drive, a database, or in the ‘cloud’.
There are two main kinds of statistics:
1. Descriptive statistics: it is a set of methods used to describe data and their characteristics
2. Inferential statistics: it involves using what we know to make inferences (estimates or predictions)
about what we don’t know
Statistics is all about weighing up the chances of something happening or being true.
The emergence of statistics
The use of statistics goes back to at least the earliest city states (Babylonians and Egyptians).
The world ‘statistics’ is derived from the Latin term for ‘state’ or ‘government’
,• In the emerging capitalist societies, the role of statisticians was to collect information about
people
• It was thought that we needed to ‘map’ the human population to make the best use of people in
industry as well as providing the services they needed
The increase in the number of datasets available, both national and international, longitudinal and
cross-sectional, has been accompanied by advances in technology, in particular the influx of
computers, especially from the 1980s and 1990s.
Do we really need to know about statistics?
If data are to be useful, they have to be processed and analyzed. This requires statistical skills, which
will become ever more important in the social sciences and beyond.
There are two main reasons why learning about statistics is useful:
1. You are constantly exposed to statistics every day of your life
2. You need to be able to understand and interpret statistics at university or in the workplace
‘There are three kinds of lies: lies, damned lies and statistics’
The expansion of data available has also led to increasing debate about how figures are constructed
• While numbers are often thought of as hard facts, they are actually the result of different decisions
about how something should be categorized or counted
• There is much cynicism about statistics and how they are used
It is crucial to be aware that statistical data can be misinterpreted, distorted or selected to serve
particular ends.
This is not the inherent fault of statistics per se; rather the fault of analysis which does not carefully
examine the logic of an argument and how data support this
• It is also common for inaccurate conclusions about findings to be made without looking at
whether it actually means what is stated or whether there could be other possible explanations
,CHAPTER 2
Data and table manners
Data
• Variables: the topics that you are interested in finding out about
• Each variable has attributes
• Each person questioned is called a case
• The responses that you get are known as observations (even if you don’t physically ‘observe’
them yourself)
Data are simply a collection of observations.
Whatever a set of data (a dataset) may refer to, the cases are the individuals in the sample, while
variables are the characteristics which make the cases different from each other.
There are an infinite number of possible variables that we might be interested in and, in fact, variables
themselves can be divided into several types:
• Continuous variables: such variables are measured in numbers, and an observation may take any
value on a continuous scale
o ‘Weights if newborn babies’; ‘Distance travelled to work by those in full-time employment’;
‘Percentage of children living in lone-parent families’
• Discrete variables or categorical variables: such variables are not measured on a continuous
numerical scale and have no numeric value
o ‘Sex: female/male’; ‘Religion: Buddhist/Christian/Muslim/other’; ‘Degree subject studied:
Politics/Sociology/Social work’
Whenever you collect or use some data, you should write down a clear code book to describe your
variables.
A codebook gives information about each variable, such as name and type, along with the units of
measurement or categories.
It is also possible to split categorical variables further into different levels of measurement:
• Nominal variables: variables that have two or more categories, but which do not have an intrinsic
order or inherent numerical quality in themselves
• Dichotomous variables: nominal variables which have only have two categories or levels
, • Ordinal variables: variables that have two or more categories, like nominal variables, but the
categories can also be ordered or ranked moving from greater to smaller values (or vice versa)
Continuous variables can be further categorized as either interval or ratio variables (although both
rarely used in social statistics)
• Interval variables can be measured along a continuum and have a numerical value; the distance
between the ranks/attributes is the same but it has an arbitrary zero point
o Temperature on the Celsius scale
• Ratio variables have all the properties of an interval variable, but also have a clear definition of
zero
o Age, height, weight, distance and temperature on the Kelvin scale
Tables
Tables are one of the best ways of presenting a set of data.
The aim of drawing a table is to transform a set of numbers into a format which is easy to understand.
• Stub head: heading that identifies the entries in the leftmost column
• Column spanner: heading that identifies the entries in two or more columns in the body of the
table
• Column heads: headings that identify the entries in just on column in the body of the table
• Stub or stub column: leftmost column of the table; usually lists the major independent or predictor
variables
• Cell: point of intersection between a row and a column
• Table note: the table note can eliminate repetition from the body of the table
• Source: identifies where the table originates from
Table rules
There are a number of key points to remember when drawing tables:
• Always make a clear title
• Make sure that all columns and rows are named properly
• Remember to state the units of measurement used
• If further explanations is needed to clarify certain points, put notes below the table
• Include the source of the data
• Take care with layout and presentation