Notes on SPSS / bps.uvt.nl
Session 1: Introduction to SPSS and descriptives
Every variable has a measurement level.
Bar chart: graphs -> legacy dialogs -> bar.
Bar chart ( % on y-axis): graphs -> legacy dialogs -> Bar -> (again) Simple -> Define -> “% of cases” -> OK.
Histogram: graph -> legacy dialogs -> Histogram (-> display normal curve).
Scatterplot: graph -> legacy dialogs -> scatter/dot -> Simple Scatter -> define X/Y axis variables.
Descriptives: analyze -> descriptive statistics -> descriptives (-> options -> variance) – check for errors!
Frequency table: Analyze -> Descriptive Statistics -> Frequencies
Order cases for a variable (highest measure -> lower measure): Data -> sort cases
Compare groups: Data -> Split File -> Compare Groups. (2 groups will appear in the output)
Undo the split file: data -> split file -> “Analyze all cases, do not create groups”.
Percent (observed frequency / N) vs Valid Percent (observed frequency / Valid N) (no missing values)
Median: middle number in a sorted list of numbers
Mode: most common observation
Variance: Sum of (x-x-(mean))2 / N-1 5-3-1: x-(mean) = 3, variance = (5-3)2 + (3-3)2 + (1-3)2 = 8
Standard deviation: √ ❑ Variance
Correlation coefficient: r (0<1): analyze -> correlate -> bivariate (Pearson Correlation)
Session 2: SPSS skills and normal distribution
Random variable: numerical outcome of a chance experiment
Continuous variable: can take on infinitely many values (height of a person)
Discrete variables: can take on only particular values (number of correct guessed answers)
When SD increases, the distribution becomes wider.
68% of the values lie within one standard deviation from the mean.
95% of the values lie within two standard deviations from the mean.
Empty cell in data = system missing. User-missing code: 999. Define in SPSS in variable view […]
Recode missing: check how many missing (frequency).
Transform -> recode into the same variables -> select variable(s) -> “Old and new values” -> (to recode
System Missing into 999) -> “system missing” = old Value, 999 = new value.
How many missing? Transform -> Count Values within Cases -> name target variable -> select variables.
Percentage missing: analyze -> descriptive statistics -> frequencies (choose target variable)
Delete case of a missing value: data -> select cases -> “If condition is satisfied” -> ‘if’ -> variable = 0 (filter
out unselected cases) SPSS has crossed out the cases & added variable ‘filter_$’ (1= included, 0= not).
Delete unselected cases: Data -> select cases -> “Delete Unselected Cases”.
Incongruent direction of measuring variable: Transform -> Recode into different Variables -> select
variable to be recoded -> specify name & label -> change (0->2 – 1->1 – 2->0) (New column in data view)
Values for whole category: Transform -> compute variable -> choose a name for the category
, Notes on SPSS / bps.uvt.nl
Session 4: Type-I/type-II, one-sample t-test, independent samples t
Statistical hypotheses pertain to population parameters.
Type I Error: rejecting the null-hypothesis, while it is true. (False negative)
Type II Error: not rejecting the null-hypothesis, while it is not true. (False positive)
Both errors are inversely related, higher alpha/p more type I, less type II.
Alpha levels aren’t right or wrong, but do have to be justified.
T-test: analyze -> compare means -> one-sample T-test -> specify test variable & value (of hypothesis).
Independent samples t-test: analyze -> compare means -> independent samples t-test -> select variable
and grouping variable -> define groups (1= 1, 2=2) [results: descriptive, testing significance, importance]
Effect-size measures: expresses size of effect in “standardized way”. Results can be compared like this.
-Cohen’s D: standardized difference between two group means. =0.5 0.5 SD away from mean.
(0.2 = small, 0.5 = medium 0.8 = large)
Session 5: T-tests and ANOVA
Syntax is a series of “commands” that tells SPSS what to do. Important for: efficiency, communication,
documentation & data management, necessity.
Syntax for descriptives: analyze -> descriptive statistics -> descriptives -> paste -> RUN syntax.
Syntax comments begin with *, end with . Here you can add VARIANCE to statistics list e.g.
Basic syntax: analyze -> General Linear Model -> Univariate
Comparing multiple groups: Analyze -> General Linear Model -> Univariate -> options: descriptive
statistics & homogeneity tests -> PASTE -> Run
Levene’s test significant (a>p): population variances of the groups should not be considered equal.
ANOVA is robust against such a violation, if the group sizes are roughly the same.
Sig. value of the “Corrected Model” in the first row = 2-sided p value for rejecting the null-hypothesis.
R squared / R2 (/ἠ2) = explained variance by group membership. [Small-0.01, Medium-0.06, Large-0.14]
Only look at cases in a certain scenario (weather): Data -> Select Cases -> “If condition is satisfied” -> “If”
-> enter condition into the equation box (weather = 1, only good weather e.g.). -> continue -> paste.
3 sources of variation: Sums of Squares total (SSt) [Corrected Total, column Type III SS], Sums of Squares
between (SSb)[group mean vs grand mean], Sums of Squares within (SSw)[individual vs grand mean].
MSb=SSb/(k−1) MSw=SSw/(N−k) F=MSb/MSw dfb=k−1 dfw=N−k
Session 1: Introduction to SPSS and descriptives
Every variable has a measurement level.
Bar chart: graphs -> legacy dialogs -> bar.
Bar chart ( % on y-axis): graphs -> legacy dialogs -> Bar -> (again) Simple -> Define -> “% of cases” -> OK.
Histogram: graph -> legacy dialogs -> Histogram (-> display normal curve).
Scatterplot: graph -> legacy dialogs -> scatter/dot -> Simple Scatter -> define X/Y axis variables.
Descriptives: analyze -> descriptive statistics -> descriptives (-> options -> variance) – check for errors!
Frequency table: Analyze -> Descriptive Statistics -> Frequencies
Order cases for a variable (highest measure -> lower measure): Data -> sort cases
Compare groups: Data -> Split File -> Compare Groups. (2 groups will appear in the output)
Undo the split file: data -> split file -> “Analyze all cases, do not create groups”.
Percent (observed frequency / N) vs Valid Percent (observed frequency / Valid N) (no missing values)
Median: middle number in a sorted list of numbers
Mode: most common observation
Variance: Sum of (x-x-(mean))2 / N-1 5-3-1: x-(mean) = 3, variance = (5-3)2 + (3-3)2 + (1-3)2 = 8
Standard deviation: √ ❑ Variance
Correlation coefficient: r (0<1): analyze -> correlate -> bivariate (Pearson Correlation)
Session 2: SPSS skills and normal distribution
Random variable: numerical outcome of a chance experiment
Continuous variable: can take on infinitely many values (height of a person)
Discrete variables: can take on only particular values (number of correct guessed answers)
When SD increases, the distribution becomes wider.
68% of the values lie within one standard deviation from the mean.
95% of the values lie within two standard deviations from the mean.
Empty cell in data = system missing. User-missing code: 999. Define in SPSS in variable view […]
Recode missing: check how many missing (frequency).
Transform -> recode into the same variables -> select variable(s) -> “Old and new values” -> (to recode
System Missing into 999) -> “system missing” = old Value, 999 = new value.
How many missing? Transform -> Count Values within Cases -> name target variable -> select variables.
Percentage missing: analyze -> descriptive statistics -> frequencies (choose target variable)
Delete case of a missing value: data -> select cases -> “If condition is satisfied” -> ‘if’ -> variable = 0 (filter
out unselected cases) SPSS has crossed out the cases & added variable ‘filter_$’ (1= included, 0= not).
Delete unselected cases: Data -> select cases -> “Delete Unselected Cases”.
Incongruent direction of measuring variable: Transform -> Recode into different Variables -> select
variable to be recoded -> specify name & label -> change (0->2 – 1->1 – 2->0) (New column in data view)
Values for whole category: Transform -> compute variable -> choose a name for the category
, Notes on SPSS / bps.uvt.nl
Session 4: Type-I/type-II, one-sample t-test, independent samples t
Statistical hypotheses pertain to population parameters.
Type I Error: rejecting the null-hypothesis, while it is true. (False negative)
Type II Error: not rejecting the null-hypothesis, while it is not true. (False positive)
Both errors are inversely related, higher alpha/p more type I, less type II.
Alpha levels aren’t right or wrong, but do have to be justified.
T-test: analyze -> compare means -> one-sample T-test -> specify test variable & value (of hypothesis).
Independent samples t-test: analyze -> compare means -> independent samples t-test -> select variable
and grouping variable -> define groups (1= 1, 2=2) [results: descriptive, testing significance, importance]
Effect-size measures: expresses size of effect in “standardized way”. Results can be compared like this.
-Cohen’s D: standardized difference between two group means. =0.5 0.5 SD away from mean.
(0.2 = small, 0.5 = medium 0.8 = large)
Session 5: T-tests and ANOVA
Syntax is a series of “commands” that tells SPSS what to do. Important for: efficiency, communication,
documentation & data management, necessity.
Syntax for descriptives: analyze -> descriptive statistics -> descriptives -> paste -> RUN syntax.
Syntax comments begin with *, end with . Here you can add VARIANCE to statistics list e.g.
Basic syntax: analyze -> General Linear Model -> Univariate
Comparing multiple groups: Analyze -> General Linear Model -> Univariate -> options: descriptive
statistics & homogeneity tests -> PASTE -> Run
Levene’s test significant (a>p): population variances of the groups should not be considered equal.
ANOVA is robust against such a violation, if the group sizes are roughly the same.
Sig. value of the “Corrected Model” in the first row = 2-sided p value for rejecting the null-hypothesis.
R squared / R2 (/ἠ2) = explained variance by group membership. [Small-0.01, Medium-0.06, Large-0.14]
Only look at cases in a certain scenario (weather): Data -> Select Cases -> “If condition is satisfied” -> “If”
-> enter condition into the equation box (weather = 1, only good weather e.g.). -> continue -> paste.
3 sources of variation: Sums of Squares total (SSt) [Corrected Total, column Type III SS], Sums of Squares
between (SSb)[group mean vs grand mean], Sums of Squares within (SSw)[individual vs grand mean].
MSb=SSb/(k−1) MSw=SSw/(N−k) F=MSb/MSw dfb=k−1 dfw=N−k