Week 1: SPSS Tutorials: Advanced Statistics:
Key Points To Consider Before Using SPSS:
→ Is The Data Clean? Are there unrealistic values? Coded in a way that is not useful to you? Clean the
file then, RECODE variables, or use COMPUTE for a new variable, and FILTER and FILTER BY to
filter by this.
→ Are Missing Values Defined? Often specific codes are used for missing data, such as -99 and -9,
SPSS will read these unless they are coded as missing values using RECODE, you must state it to SPSS.
→ Method and Data Match? OLS linear regressions assume linear relations between quantitative
dependent and independent variables, alongside categorical dummy, check if this is possible, if using
categorical or nominal variables for dependent, other methods of analysis are required.
→ Focus Of Analysis: do you want the whole group of respondents? Do we wish to target specific
groups? FILTER command, or create a new variable which only includes a subset of the population.
→ Do We Need to Weigh Data? Sometimes some groups are overrepresented in the data, or
unrepresented in a dataset, which can cause skewed results. To correct for any imbalance between the
survey sample and the population, a weighting variable can be used.
SPSS Videos Week 1:
1) How To Clean Data:
→ In order to clean data in SPSS, check descriptives, and click paste to open your syntax and run it, if
there are any missing values, the descriptives will not look right.
→ After this, you go into the variable view and set the missing values of your choice.
→ When you make changes to the data, such as removing large outliers to clean up distributions or allow
for regression, make sure these are kept in a record, as you may need to retrace your steps.
→ You set missing values to focus on a specific variable in the dataset, or to remove difficult or
inconvenient results, which do not match the majority, but do impact functions and graphs
→ Transform → Recode Into Same Variable, then select variable, and press old and new values, on the
left select the values you wish to change or remove, and on the right you may replace them with a given
value, or press system missing as the new value, and click add to continue and recode.
→ Can also make your own variables, Transform → Compute Variable, here you can use an old
variable to compute a new one such as age, using the year of the study, and the birth year of participants
,from that data.
2) How To Scatter-plot:
→ Graphs → Legacy Dialogues → Scatter plot → Simple Scatter, from here you may set which
variable you want on the x axis (usually independent), and what you want on the y axis (usually
dependent).
→ You then paste to open syntax and run this.
3) How To Correlation Analysis:
→ Analyse → Correlate → Bivariate, from here you set your variables again, and choose the
correlation coefficient of your choice, Pearson R Value, Kendall's tau b or Spearman.
→ Make sure you also choose a test of significance based on your question and analysis, either a
two-tailed or one-tailed significance test should be selected.
→ A one-tailed test is only appropriate if you want to determine a difference in a specific direction
(directional hypothesis), two-tailed tests for the possibility of both positive and negative difference
(non-directional).
→ Paste to open this in syntax and run it, or press open to skip straight to the correlation analysis.
4) How To Bi-variate Regression:
→ Analyse → Regression → Linear, from here you set the variables that you are interested in, you pick
the dependent and independent, such as high brow culture being dependent on household socioeconomic
status.
→ Press paste, this provides you with the syntax, you may also press okay and it will run the syntax and
show the model straight away.
→ Largely interested in unstandardised b coefficient, see if it is positive or negative to see if there is an
increase or decrease relationship, and by what that increase or decrease is
→ can also see whether this is significant, if your p is greater than your confidence (>0.05), accept the
null, if your confidence is greater than your p (<0.05), reject the null.
→ R squared/explained variance, acknowledges variance in the variable, but only said amount of variance
can be explained by the variable (the r value) the rest is unexplained
, 5) How To Multiple Regression:
→ Analyse → Regression → Linear, from here you add your dependent variable, what you want to test
that changes through variation, you then add your independent variables, can be 2 or more.
→ Press paste, this will provide you with your syntax, you may also press open, and it will directly
display the models.
→ this is not much different to normal regressions, there is just additional variable/s
→ again you look at the unstandardised b coefficients, and see how much the independent variable
impacts the dependant, such as age having an effect of 0.05 and socioeconomic status having an effect of
0.11
→ check the significance again, if one is significant, the other is automatically significant as well.
SPSS Exercise 1:
1. Switch between the “Data View” and “Variable View” tabs. What do the columns in the
"Variable View" window show? Take a close look at the contents of both tabs.
Data view shows data for a given question/variable, for example, for the variable of birth year you can see
the data and the years going down in their respective rows, 1981, 1869, etc, this is the actual data which is
collected and presented in the dataset.
In the variable view you see the coded names for variables, their labels, the amount of missing values
they have, what type of variable (nominal, interval, ordinal), and their values (how they are coded, such as
0= female, 1= male, 2= other).
2. Create a new variable: age. The data was collected in 2008.
Transform → Compute Variable
COMPUTE age = 2008 - birth year.
EXECUTE.
3. What is the mean, the standard deviation and the minimum and maximum of the variables
gender, working hours per week, age and gross hourly wage?
Analyse → Descriptive Statistics → Descriptives
DESCRIPTIVES gender, work hrs, hour wage, age
/STATISTICS = MAX, MIN, MEAN, STDDEV.
EXECUTE.
, A 3000 value is clearly impacting
working hours per week, we can use
the RECODE command to solve this.
Transform → Recode Into Same Variable
RECODE workers (3000=SYSMIS)
EXECUTE.
Can also use FILTER command, to filter by a computed variable and exclude specific population or parts
of data.
4. Formulate possible null and alternative hypotheses for the relationship between gross hourly
wage and working hours per week. There are two possibilities for hypotheses: directional and
non-directional.
Non-Directional: Null Hypothesis is Ho = p = 0, Expected Hypothesis is H₁ = p ≠ 0
Directional: Null: Ho = p < 0, Expected Hypothesis H₁ = p > 0
Non-Directional: no relationship between gross hourly wage and working hours per week, a relationship
between gross hourly wage and working hours per week
Directional: as working hours per week increases gross wage increases (a positive relationship), null to
this would be that there is no change in gross wage after working hour increase (no positive relationship).
5. Make a scatter plot for the variables gross hourly wage and working hours per week. What
conclusions can you draw?
Y-Axis = Dependant Variable
X-Axis = Independent Variable
Graphs → Legacy Dialogs → Scatter → Simple Scatter Define
GRAPH
/SCATTERPLOT(BIVAR)= workers WITH hour wage
/MISSING=LISTWISE.
EXECUTE.