2017
GE5 – Conducting
Research
SUMMARY BOOK & SEMINARS
DYLAN BOET
,Table of Contents
Variables, Goal of statistics ........................................................................................................... 3
Chapter 1 – Why Statistics? ................................................................................................................. 3
Chapter 2 – Variability and measurement ........................................................................................... 3
Chapter 3 – Tables and Diagrams ........................................................................................................ 5
Seminar 1 ............................................................................................................................................. 8
Measures of Central Tendency and Measures of Dispersion ........................................................ 12
Chapter 4 – Describing variables numerically.................................................................................... 12
Seminar 2 ........................................................................................................................................... 13
Distributions and Z-scores .......................................................................................................... 14
Chapter 5 – Shapes of distribution of scores ..................................................................................... 14
Seminar 3 ........................................................................................................................................... 17
Chapter 6 – Standard deviation and z-scores .................................................................................... 21
Correlation and chi^2 ................................................................................................................. 22
Chapter 7 – Relationships between two or more variables ............................................................... 22
Chapter 8 – Correlation coefficients .................................................................................................. 23
Seminar 4 ........................................................................................................................................... 25
Regression and Significance testing ............................................................................................. 31
Chapter 9 – Regression ...................................................................................................................... 31
Chapter 10 – Samples and population ............................................................................................... 32
Seminar 5 ........................................................................................................................................... 32
Chapter 11 – Statistical significance for the correlation coefficient .................................................. 35
Standard error and t-test ............................................................................................................ 36
Chapter 12 – Standard error .............................................................................................................. 36
Chapter 13 – The t-test (Comparing two samples of correlated/related scores) .............................. 37
Chapter 14 – The t-test (Comparing two samples of unrelated/uncorrelated scores) ...................... 37
Seminar 6 ........................................................................................................................................... 38
Remaining chapters about significance testing ............................................................................ 40
Chapter 15 – Chi-square .................................................................................................................... 40
Chapter 16 – Probability .................................................................................................................... 41
Chapter 17 – Reporting significance levels succinctly ....................................................................... 42
Seminar 7 ........................................................................................................................................... 43
Formulas for the exam ............................................................................................................... 47
Mean .................................................................................................................................................. 47
Variance ............................................................................................................................................. 47
Standard Deviation ............................................................................................................................ 47
Z-score ................................................................................................................................................ 47
Pearson r ............................................................................................................................................ 47
NOTE: Don’t forget to watch the videos on Moodle as well, it’ll help you a lot. Additionally,
I’ve provided this document with a lot of visuals to make it more clear. Good Luck!
2
,Variables, Goal of statistics
Chapter 1 – Why Statistics?
Some things that statistics can do for the researcher:
Helps develop ways of making predictions of what is most likely using a set of
predictors
Helps deal with circumstances in which random sampling is impossible. In these
circumstances the group may be systematically different in important ways. These
differences may be controlled for.
Helps build ‘models’ which account for relationships between variables
Test for statistical significance – that’s whether the trend in the data would be very
unlikely to be the result of chance due to sampling
Estimates the sample size which is not too big and not too small. That is, the sample
size which does not waste resources because it is too big but will detect a trend if
there is one.
Helps you draw tables and diagrams which illustrate your data in the most
informative way.
Provides ways of assessing that the validity of test would be if the measure were
perfectly reliable.
Chapter 2 – Variability and measurement
Statistical techniques perform three main functions:
1. They provide ways of summarising the information that we collect from a multitude
of sources. Statistics is partly about tabulating your research information or data as
clearly and effectively as possible. As such, it merely describes the information
collected. This is achieved using tables and diagrams to summarize data, and simple
formulae which turn fairly complex data into simple indexes that describe
numerically the main features of the data. This branch of statistics is called
descriptive statistics for very obvious reasons – it describes the information you
collect as accurately and succinctly as possible.
2. Another branch of statistics is far less familiar to most of us: inferential statistics.
This branch of statistics is really about economy of effort in research. There was a
time when in order to find out about people, for example, everyone in the country
would be contacted in order to collect information. This is done today when the
government conducts a census of everyone in order to find out about the population
of the country at a particular time. This is an enormous and time-consuming
operation that cannot be conducted very often. But most of us are familiar with using
relatively small samples in order to approximate the information ah tine would get
by studying everybody. This is common in public-opinion surveying where the
answers of a sample of 1000 or so people may be used, say, to predict the outcome
of a national election. Even though samples can sometimes be misleading,
nevertheless it is the principle of sampling that is important. Inferential statistics is
about the confidence with which we can generalise from a sample to the entire
population.
3
, 3. The amount of data that a researcher can collect is potentially massive. Some
statistical techniques enable the researcher to clarify trends in vast quantities of data
using a number of powerful methods. Data simplification, data exploration and data
reduction are among the names given to the process. Whatever the name, the
objective is the same – to make sense of large amounts of data that otherwise would
be much too confusing.
The concept of a variable is basic but vitally important in statistics. It is also as easy as pie. A
variable is anything that varies and can be measured. These measurements need not to
correspond very well with everyday notions of measurements such as weight, distance and
temperature. So the gender of the person is a variable since it can be measured as either
male or female – and gender varies among people.
Another type of measurement in statistics is more directly akin to everyday concepts of
measurement in which numerical values are provided. These numerical values are assigned
to variables such as weight, length, distance, temperature and the like - for example, 10
kilometres or 30 degrees. These numerical values are called scores. In psychological research
many variables are measured and quantified in much the same way.
Traditionally, statistics textbooks for psychologists emphasize different types of
measurement – usually using the phrase scales of measurement. However, for virtually all
practical purposes there are only two different types of measurement in statistics:
1. Score/numerical measurement: This is the assignment of a numerical value to a
measurement this includes most physical and psychological measures. In
psychological jargon, these numerical measurements are called scores. We could
record the IQ scores of five people. Each of the numerical values in a table, for
example, indicates the named individual’s score on the variable IQ. It is a simple
point that the numbers contain information that someone with an IQ of 150 has a
higher intelligence than someone with an IQ of 80. In other words, the numbers
quantify the variable.
2. Nominal/category measurement: This is deciding which category of the variable a
particular case belongs. It is also appropriate to refer to it as qualitative measure. So,
if we were measuring a person’s job or occupation, we would have to decide
whether or not he or she was a lorry driver, a professor of sociology, a debt collector
and so forth. This is called nominal measurement since usually the categories are
described in words and, especially, given names. Thus the category ‘lorry driver’ is a
name or verbal description of what sort of case should be placed in that category.
Many psychologists speak of four different scales of measurement. Conceptually they are
distinct. Nevertheless, for most practical situations in psychologists’ use of statistics the
nominal category versus numerical scores distinction discussed above is sufficient. The four
‘theoretical’ scales of measurement are as follows. The scales numbered 2. 3 and 4 are
different types of numerical scores:
4
, 1. Nominal categorisation: This is the placing of cases into named categories – nominal
clearly refers to names. It is exactly the same as our nominal measurement or
categorisation process.
2. Ordinal (or rank) measurement: The assumption here is that the values of the
numerical scores tell us little else other than which is the smallest, the next smallest
and so forth up to the largest. In other words, we can place the scores in order
(hence ordinal) from the smallest to the largest. It is sometimes called rank
measurement since we can assign ranks to the first, second, third, fourth, fifth, etc. in
order from the smallest to the largest numerical value. These ranks have the
numerical value 1, 2, 3, 4, 5, etc. However, few psychologists collect data directly as
ranks.
3. Interval or equal-interval measurement: The basic idea here is that in some cases
the intervals between numbers on a numerical scale are equal in size. Thus, if we
measure distance on a scale of centimetres then the distance between 0 and 1
centimetres on our scale is exactly the same as the difference between 4 and 5
centimetres or between 11 and 12 centimetres on that scale. This is obvious for
some standard physical measurements such as temperature.
4. Ratio measurement: This is exactly the same as interval scale measurement with one
important proviso. A ration scale of measurement has an absolute zero point that is
measured as 0. Most physical measurements such as distance and weight have zero
points that are absolute. Thus zero on a tape measure is the smallest distance one
can have – there is no distance between two coincident points. With this sort of scale
of measurement, it is possible to work out ratios between measures. So, for example,
a town that is 20 kilometres away is twice as far away as a town that is only 10
kilometres away. A building that is 15 metres high is half the height of a building that
is 30 metres high. (Not all physical measures have a zero that is absolute zero – this
applies particularly to several measures of temperature. Temperatures measured in
degrees Celsius or Fahrenheit have points that are labelled as zero. However, these
zero points do not correspond to the lowest possible temperature you can have. It is
then meaningless to say, for example, that it is twice as hot if the temperature is 20
degrees Celsius than if it were 10 degrees Celsius.)
Chapter 3 – Tables and Diagrams
If we asked 100 people their age, gender, marital status, their number of children and their
occupation this would yield 500 separate pieces of information. Such unprocessed
information is called raw data. Statistical analysis has to be more than describing raw
ingredients. It requires the data to be structured in ways that effectively communicate the
major trends or characteristics of your data.
One of the main characteristics of tables and diagrams for nominal (category) data is that
they have to show the frequencies of cases in each category used. While there may be as
many categories as you wish, it is not the function of statistical analysis to communicate all
the data’s detail; the task is to identify the major trends or features.
5
,Rules of thumb for tables and diagrams for nominal (category) data:
Keep your number of categories low, especially when you have only small numbers
of participants in your research.
Try to make your ‘combined’ categories meaningful and sensible in the light of the
purposes of your research. It would be nonsense, for example, to categorise jobs by
the letter of the alphabet with which they start – nurses, nuns, nursery teachers and
national footballers. All of these jobs beginning with the same letter but it is very
difficult to see any other common thread which allows them to be combined
meaningfully.
In terms of drawing tables, all we do is to list the categories we have chosen and give the
frequency of cases that fall into each of the categories. The frequencies are presented in two
ways in this table – simple frequencies and percentage frequencies. A percentage frequency
is the frequency expressed as a percentage of the total of the frequencies (or total number
of cases, usually).
Sometimes it is preferable to turn frequency tables into diagrams. Good diagrams are quickly
understood and add variety to the presentation. The main types of diagram is a very familiar
form of presentation – it simply expresses each category as a slice of a pie which represents
all cases. Notice that the number of slices is small – a multitude of slices can be confusing.
Each slice is clearly marked with its category name, and the percentage frequency in each
category also appears.
This pie chart shows a bad example of a pie
diagram for purposes of comparison. There
are several problems with this pie diagram:
There are too many small slices identified
by different shading patterns
It is not too easily to seen what each slide
concerns, and the relative sizes of the slices
are difficult to judge. In other words, too
many categories have resulted in a diagram
which is far from easy to read – a cardinal sin
in any statistical diagram.
A simple frequency table might be more effective in this case. Another familiar form of
statistical diagram for nominal (category) data is the bar chart. Again these charts are very
common in the media. Basically they are diagrams in which bars represent the size of each
category. The relative lengths (or heights) of the bars quickly reveal the main trends in the
data. With a bar chart, there is very little to remember other than the bars have a space
separating them. The spaces indicate that the categories are not in a numerical order; they
are frequencies of categories, not scores.
6
,It is hard to go wrong with a bar chart so long as you remember the following:
The heights of the bars represent frequencies (number of cases) in a category.
Each bar should be clearly labelled as to the category it represents.
Too many bars make bar charts hard to follow
Avoid having many empty or near-empty categories which represent very few cases.
Generally, the information about substantial categories is the most important. (Small
categories can be combined together as an ‘other’ category).
Nevertheless, if important categories have very few entries then this needs
recording. So, for example, a researcher who is particularly interested in
opportunities for women surveys people in top management and finds very few
women employed in such jobs. It is important to draw attention to this in the bar
chart if males and females in top management. Once again, there are no hard-and-
fast rules to guide you – common sense will take you a long way.
Make sure that the vertical axis (the heights of the bars) is clearly marked as being
frequencies.
The bars should be of equal width.
In newspapers and on television you are likely to come across the variant of the bar chart,
called the pictogram. In this case, the bars of the bar chart are replaced by varying sized
drawings of something eye-catching to do with your categories. Thus, pictures if men or
women of varying heights, for example, replace the bars.
A histogram might be the best form of statistical diagram to represent these data. At first
sight, histograms look very much like bar charts but without gaps between the bars. This is
because the histogram does not represent distinct unrelated categories but different points
on a numerical measurement scale.
Sometimes you will have to use bands of scores rather than individual score values. If we
asked 100 people their ages we could categorise their replies into bands such as 0-9 years,
10-19 years, 20-29 years, 30-39 years, 40-49 years and a final category of those 50 years and
over.
There are couple of mistakes that you can make in drawing up tables and diagrams:
Do not forget to head the table or diagram with a succinct description of what it
concerns. You will notice that we have done our best throughout this chapter to
supply each table and diagram with a clear title.
Label everything on the table or diagram as clear as possible. What this means is that
you have to mark your bar charts and histograms in a way that tells the reader what
each bar means. Then you must indicate what the height of the bar refers to –
probably either frequency or percentage frequency.
7
, Seminar 1
Variables:
Nominal / category
Examples: cigarette brand (Camel, Marlboro, other), religious affiliation (Roman Catholic,
Dutch Reformed, Calvinist, other Christian, Muslim, other religion, none).
Score / numerical
Examples: temperature of the smoke in °C, golf scores (holes above or below par), year, time
to complete a task, age, height.
Dichotomous.
Examples: being a smoker (yes/no), gender (male/female).
Other nominal.
Examples: cigarette brand (Camel, Marlboro, other), religious affiliation (Roman Catholic,
Dutch Reformed, Calvinist, other Christian, Muslim, other religion, none).
Ordinal.
Examples: frequency of smoking (never, incidentally, daily), rank in competition, clothing
sizes (S, M, L, XL), attitude toward organic vegetables (positive, neutral, negative).
Interval.
Examples: temperature of the smoke in °C, golf scores (holes above or below par), year.
Ratio.
Examples: time to complete a task, age, height.
Discrete versus continuous variables:
Discrete:
The variable has a certain fixed values. i.e. ranking 1st, 2nd, etc.
Continuous:
a continuous variable is a variable that can have any value you can imagine. Example:
weight. A weight is 5 kilograms… But I can measure it also in grams… then it might be 5025
grams…
Variables and data:
variable:
characteristic or condition that changes or has different values for different
individuals
value:
possible outcome
code:
a number for a score
data:
measurements or observations of a variable
data set:
a collection of measurements or observations.
score:
a single measurement or observation
8