STATISTICS HANDBOEK FINAL EXAM
Pre-master Communication Information Sciences
Final exam Statistics
, Contents
Reminders/Tips & Tricks .................................................................................................... 3
Which test to choose? ....................................................................................................... 7
One-sample t-test ................................................................................................................................................................ 8
Independent-samples t-test ........................................................................................................................................... 8
Dependent (Paired) samples t-test ............................................................................................................................. 8
One-way ANOVA.................................................................................................................................................................. 8
Two-way (Factorial) ANOVA ......................................................................................................................................... 8
Chi-square (Pearson’s) ..................................................................................................................................................... 9
Correlation (Pearson’s) .................................................................................................................................................... 9
(Linear) Regression ........................................................................................................................................................... 9
Step 1. Get the data ready ................................................................................................12
1. Check whether the dataset contains invalid input and/or scores .................................................... 12
2. Check whether items for scales need recoding (reverse scores) ...................................................... 12
3. Check the reliability of the scales .................................................................................................................... 13
4. Compute the mean values of measurements ............................................................................................. 15
5. Split the data............................................................................................................................................................. 15
6. Cut off points ............................................................................................................................................................ 16
Step 2. Explore the data ....................................................................................................17
Step 3. Check the assumptions..........................................................................................18
1. Assumption of Normality.................................................................................................................................... 18
2. Assumption of Homogeneity of variance ..................................................................................................... 18
3. DV measured on an interval or ratio level .................................................................................................. 19
4. Independent data ................................................................................................................................................... 19
Step 4. Perform statistical test ..........................................................................................20
1. T-tests ............................................................................................................................................................................... 20
Independent-samples t-test ...........................................................................................................................................20
Dependent (Paired) samples t-test ........................................................................................................................... 24
One-sample t-test ...............................................................................................................................................................26
2. ANOVA .............................................................................................................................................................................. 28
One-Way ANOVA ................................................................................................................................................................28
Follow-up tests One-Way ANOVA ...............................................................................................................................30
Two-Way (Factorial) ANOVA .......................................................................................................................................34
3. Correlation ...................................................................................................................................................................... 39
4. Regression ....................................................................................................................................................................... 46
5. Chi-Square ....................................................................................................................................................................... 49
Reporting formats APA .....................................................................................................54
Writing................................................................................................................................................................................... 54
Decimals ................................................................................................................................................................................ 54
Mean and Standard deviation...................................................................................................................................... 54
Confidence intervals ........................................................................................................................................................ 54
Reliability ............................................................................................................................................................................. 54
Assumption of Normality .............................................................................................................................................. 54
Assumption of Homogeneity ....................................................................................................................................... 54
,Statistics Handbook Final Exam / M.C. van der Horst
Reminders/Tips & Tricks
Steps of data analysis:
1. Get your data ready
a. Check reliability of scales
b. Compute mean values or other necessary variables
2. Explore your data (with graphs)
a. Look at the distribution and the means (explore function)
b. Check for outliers
3. Check whether assumptions of the test are met
4. Perform main analyses
5. Calculate the effect size
6. Report
Independent and dependent variable
Independent variable = the variable that is changed in a scientific experiment to test the effects on
the dependent variable.
The cause you are testing, the manipulated variable (“if-part”).
Dependent variable = the variable being tested and measured in a scientific experiment.
The effect you are measuring, the responding variable (“then-part”).
Measurement levels
Values represent categories with no intrinsic ranking (binary and nominal)
For example: 1 = men and 2 = women
Values represent categories with some intrinsic ranking (ordinal, Likert scale)
For example: 1 = highly satisfied, 2 = satisfied, 3 = neutral, 4 = dissatisfied, 5 =
highly dissatisfied.
Values represent ordered categories with a meaningful metric, so that distance
comparisons between values are appropriate (interval and ratio).
For example: years of age.
Categorical variables à distinct categories
- Binary or dichotomous variable
= only two categories
- Nominal variable
3
,Statistics Handbook Final Exam / M.C. van der Horst
= more than two categories
- Ordinal variable
= same as nominal, but the categories have a logical order
Continuous variable à distinct score
- Interval variable
= equal intervals on the variable represent equal differences in the property being measured
- Ratio variable
= same as an interval, but the ratio of scores on the scale must also make sense and there is
an absolute zero
Hypothesis
The Null hypothesis (H0) is the one we try to reject. This means that there is no effect.
If we can reject the H0 hypothesis, this means that H1 (alternative hypothesis) is supported, but not
proven.
Significance = the p-value tells you how likely it is that H0 is true
Below 0.05 = significant
Over 0.05 = not significant
p = < .05 LOWER is significant
p = > .05 HIGHER is not significant
Thus, the 0 hypothesis can be rejected if the p-value is lower than 0.05.
When p = 0.0000, report p < .001
String and numeric variables
SPSS has two variable types: string and numeric. Numeric variables may contain only numbers. String
variables may contain letters, numbers and other characters. The distinction between numeric and
string variables is important because the variable type dictates what you can or cannot do with a
variable.
• You can do calculations with numeric variables but not with string variables
• You can use string functions such as taking substrings or concatenating with string variables
but not with numeric variables.
APA
Do NOT put your answers fully in italic or bold.
• Pay attention to italics:
p =, r =, M =, SD =, F, etc.
• Decimals
Always report everything with TWO decimals
Except:
o p = 3 decimals (.05)
o npartical = 3 decimals (.24)
4
,Statistics Handbook Final Exam / M.C. van der Horst
• 0.01 vs. .01
o If a value can be more than 1 à x.xx (SD length, F, etc.)
o If a value cannot be more than 1 à .xx (p, r, α, etc.)
o Thus: if a value can be greater than one, report the zero (0.78)
• Do not use variable names in the text, use normal language
o NOT: emoji_happy
o BUT: a review with a happy emoji
• Be parsimoniously (spaarzaam) in your answers, only answer what is asked. The
teachers/readers want to look at your answers as fast as possible.
A complete answer contains:
• A rephrase of the hypothesis that you are going to test.
• The test that you use.
• If you calculate a variable from several items, describe what is measured, on which scale the
items were measured and give at least one example item.
• When you calculate a variable from several items (when scale is involved), perform a reliability
analysis and note the Cronbach alpha and information about the question.
• Always give basic descriptive statistics for all relevant variables: Means, SD’s
• Assumptions (when violated) and report the outcome
• Effect size (only when there is one!!)
o In case of hypotheses about interaction effects, include a graph to illustrate the nature
of the interaction.
o Report the direction and the magnitude of the effect (effect size)
• Conclusion whether the results support the hypothesis or not.
Graphs
• Only include a graph for the Factorial ANOVA interaction effect or when it is asked
• Look out for the scores on the X-axis and Y-axis
à let them begin by 0
à look if they have the correct labels
• Always include error bars
Graphs à chart builder à display error bars
When the 95% CIs overlap, this indicates that the true scores of both groups fall within the
same range - i.e. may be similar.
Effect sizes
Only report an effect size when there is an effect!!!
In other words, effect sizes are informative only when the outcome of your T test allows you to reject
H0. After all, there is little sense in calculating the size of something when that something doesn’t
exist.
5
,Statistics Handbook Final Exam / M.C. van der Horst
Skewness and kurtosis
Skew
• The symmetry of the distribution.
• Positive skew (scores bunched at low values with the tail pointing to high values; or tail-to-
right).
• Negative skew (scores bunched at high values with the tail pointing to low values; or tail-to-
left).
Kurtosis
• The ‘heaviness’ of the tails.
• Leptokurtic = high peak, ‘heavy’ tails.
• Platykurtic = low peak, ‘light’ tails.
6
,Statistics Handbook Final Exam / M.C. van der Horst
Which test to choose?
à on the exam: always use one test to test all the hypotheses in one question!!
o Is there a linear relationship between two variables?
à Correlation r
Example: Is life expectancy related to the availability of television?
o Does one var linearly predict the other var?
à Regression β
Example: Participants’ sense of power predicts the height of their first negotiation offer.
Specifically, the stronger participants’ sense of power, the lower their first negotiated offers will be.
o Does a group’s mean score on a variable deviate from a set value?
à One-sample t-test t
Example: To check if people were more than 50% of the time correct.
o Does a group’s mean score on a var deviate from another group’s mean score on that same var?
à Independent samples t-test t
Example: To check if boys drink more alcohol than girls.
o Does a group’s mean score on one var deviate from that same group’s mean score on another
var?
à Dependent samples t-test t
Example: To check if people’s scores on a language test improve if they follow a language course
between the pre-test and the post-test.
o To check if there is a relationship between two categorical variables.
à Chi-square
Example: Boys are more likely than girls to have their own mobile phone.
o To investigate the interaction of 2 IVs on the DV.
à Factorial ANOVA
Example: Females achieved higher outcomes than males. Power-primed females achieved
similar outcomes as males.
o To compare multiple (more than 2) means
à One-way ANOVA
Example: Business students drink more than language and media students.
7
,Statistics Handbook Final Exam / M.C. van der Horst
One-sample t-test
Compares the mean of a group against a fixed value (two means based on related data)
à when you want to know if the mean of one sample differs significantly from some specified value
For example: testing whether a score is higher than the mid-point of the scale or if you want to
compare a sample mean to a population mean.
Independent-samples t-test
Compares the means of two independent groups/compares two means based on independent data
à when scores are measured in two distinct groups
à when scores are unrelated/independent à between-group design
Compares means of 2 subgroups on 1 outcome measure:
IV = binary
DV = interval/ratio
For example: data from different groups of people that do not overlap (gender, smoker/non-smoker,
etc.)
Dependent (Paired) samples t-test
Compares two means based on related data.
à when scores are measured within members of the same group of participants (when scores are
related).
à Within the total group of observations (persons, cases, rows), we compare the score on variable Y1
with the score on variable Y2.
Compares related means, collected from the same observations (repeated measures or paired
observations)
IV = binary
DV = interval/ratio
For example: data from the same people measures at different times (before and after manipulation)
(within-subject design) and data from ‘matched’ samples (parent-child couples)
One-way ANOVA
Used when comparing two or more group means on a continuous dependent variable (infinite number
of possible values). Used when you have one independent variable that has two or more levels.
à test the significance of the differences among several (> 2) independent group means.
Compares the means of more than 2 groups on one outcome measure Y
IV = categorical, more than 2 categories
DV = interval/ratio
Two-way (Factorial) ANOVA
Compares the effect of 2 (or more variables) on one outcome measure Y, thereby creating 4 groups
(or more)
8
,Statistics Handbook Final Exam / M.C. van der Horst
When you have two independent variables and both variables have been manipulated using different
participants in all conditions.
IV = 2 categorical variables
DV = interval/ratio
Chi-square (Pearson’s)
Comparing means of two categorical variables.
Examines relationship between 2 categorical variables by looking at the distribution across the cells
IV = categorical
DV = categorical
Correlation (Pearson’s)
A correlation is used to examine the relationship between ordinal, interval or ratio variables (but
NEVER the relationship between nominal (thus no categories, not ‘cat or dogs’ as pets) variables,
because the values always have to have a meaningful order for correlation).
Specifically, it measures to what extent two variables are related. It does not say anything about
causality. Thus, amount of km’s and fuel use of a car.
Examines the presence and strength of a linear relationship between two variables
IV = interval/ratio
DV = interval/ratio
(Linear) Regression
We use a linear regression when the hypothesis contains a prediction, that is, when we want to see if
one variable predicts the behavior of another one. More precisely, if X and Y are two related variables,
then linear regression analysis helps us to predict the value of Y for a given value of X or vice versa.
Unlike a correlation, it says something about the direction of the relationship.
This can only be used when there is one independent variable, and the dependent variable must be
continuous. It aims to determine how well a model generalizes to the population > allows to confirm
or reject the hypothesis.
Examines a linear prediction of outcome var Y by independent var X
IV = interval/ratio
DV = interval/ratio
9
, Statistics Handbook Final Exam / M.C. van der Horst
Examples
1) Do men require more time (measured in seconds) to prepare a meal than women?
T-test
(because there are two groups: one categorical and one scale)
2) Does the amount of money in people's bank account predict the number of products they buy?
Regression
(predict + amount of/number of is counting and scale variable)
3) Is there a relationship between having a partner and having a depression (vs. not having one)?
Chi-Square
(because there are two levels: you either have a partner or you don’t (binary) and you either
have a depression or not (binary))
4) Does taking drugs (yes/no) and drinking alcohol (yes/no) make people dance longer?
Factorial ANOVA
(because DV (time spend dancing = continuous), the predictor is drugs and alcohol. More than
one predictor and a continuous variable = factorial ANOVA)
5) Do people generally like Riri more than Drake? (Likeability measured with 5 items)
Paired Samples T-test
(because the participants are in the same group and you have two different scores. It is not an
ANOVA, because that test required a continuous predictor)
6) Does the nr of fruits people eat during 6 months relate to their weight loss (in grams)?
Correlation
(because there are two continuous variables and the hypothesis asks you to relate them and
not relate them)
7) Do blondes have more fun (mean score of 7 Likert-scale items) than brunettes and redheads?
One-way ANOVA
because you are comparing 3 groups on a outcome measure and there is only one predictor
and that is the hair colour
8) Can men with a beard gather more phone numbers of girls in a bar than men without a beard?
Independent T-test
(because, there is an continuous scale and binary variable)
9) Does the nr of hours TV viewing/week predict people's perception of the chance to get murdered?
Regression
(key word = predict)
10) How do having kids and/or having pets affect the degree of hygiene in people's household?
Factorial ANOVA
(because, there are two predictor variables: having kids (yes/no) and having pets (yes/no)
assuming that the degree of hygiene is a continuous variable)
11) Does the number of partners relate to the nr of psychiatric illnesses people have in life?
Correlation
(key words = number of and relates)
10