Summary
Advanced Statistics
English
If this was helpful for you,
we would appreciate your review <3
,Index
Statistical Methods for the Social Sciences – Agresti ............................................................... 4
Chapter 9 – Linear regression and correlation ............................................................................ 4
Chapter 10 – Introduction to multivariate relationships ............................................................. 13
Chapter 11 – Multiple Regression and Correlation.................................................................... 18
Chapter 13 – Multiple Regression with Quantitative and Categorical Predictors ...................... 25
Multiple Regression – Allison ..................................................................................................... 27
Chapter 1 – What is multiple regression? .................................................................................. 27
Chapter 2 – How do I interpret multiple regression results? ..................................................... 30
Chapter 3 – what can go wrong with multiple regression? ........................................................ 36
Chapter 4 – How do I run a multiple regression? ...................................................................... 40
Chapter 5 – How does bivariate regression work? .................................................................... 43
Chapter 6 – What are the assumptions of multiple regression?................................................ 46
Chapter 7 – What can be done about multicollinearity? ............................................................ 50
Chapter 8 – How can multiple regression handle nonlinear relationships? .............................. 53
Lectures ........................................................................................................................................ 58
Lecture 1 .................................................................................................................................... 58
Lecture 2 .................................................................................................................................... 62
Lecture 3 .................................................................................................................................... 66
Lecture 4 .................................................................................................................................... 70
Lecture 5 .................................................................................................................................... 76
Lecture 6 .................................................................................................................................... 83
Tutorial videos.............................................................................................................................. 88
1 – Data cleaning, running a scatterplot & performing a correlation analysis ........................... 88
2 – How to perform a bivariate regression in SPSS? ................................................................ 88
3 – How to perform a multiple regression in SPSS? ................................................................. 89
4 – How to perform a multiple regression in SPSS: mediation.................................................. 89
6 – How to create dummy variables in SPSS ............................................................................ 90
7 – How to use dummy variables in SPSS ................................................................................ 91
8 – How to use dummy variables in SPSS while controlling for another variable ..................... 91
9 – How to create an interaction term with two dummy/dichotomous variables ....................... 92
11 – How to create an interaction term with one dummy and one interval variable.................. 92
12 – How to perform/interpret an interaction term with one dummy and one interval variable . 93
13 – How to create, perform and interpret an interaction term with two interval variables ....... 94
2
,Writing tutorials............................................................................................................................ 95
1 – Structure of an empirical research report ............................................................................ 95
2 – Construct versus operational level ....................................................................................... 95
3 – APA-guidelines for research projects .................................................................................. 96
4 – Introduction........................................................................................................................... 98
5 – Method ................................................................................................................................. 98
6 – results ................................................................................................................................... 99
3
,Statistical Methods for the Social
Sciences – Agresti
Chapter 9 – Linear regression and correlation
Linear relationships
For categorical variables, we did this by comparing the conditional distributions of y in the different
categories of x, in a contingency table. For quantitative variables, a mathematical formula
describes how the conditional distribution of y (such as y = crime rate) varies according to the value
of x.
Linear functions: interpretation of the y-cut-off point and slope
The formula y = α + βx expresses observations to y as a linear function of observations to x.
The formula has a rectilinear graph with direction coefficient β (beta) and y-cut-off point α (alpha).
Each real number x, when entered in the formula y = α + βx, returns a separate value for y.
At x = 0, the equation y = α + βx is simplified to y = α + βx = α + β(0) = α.
So the constant α in this equation is the value of y when x = 0.
α is called the y-cut-off point.
The slope β is equal to the change in y for an increase of x by one unit.
In the context of a regression analysis, α and β are called regression coefficients.
Models are simple approaches to reality
A model is a simple approach to the relationship between variables in the population. The linear
function offers a simple model for the relationship between two quantitative variables. For a given
value of x the model y = α + βx predicts a value for y.
Association does not imply causation.
A sensible model is actually a little more complex than the model we've presented so far, by
allowing variability in y-values at each value for x. That model, not just a straight line, is what we
mean by a regression model.
Least squared prediction equation
A scatterplot displays the data
The first step of model customization is to plot the data, to see if a model with a rectilinear trend
makes sense.
A plot of the n observations as n points is called a scatterplot.
Prediction equation
When the scatterplot suggests that the model y = α +βx might be suitable, we use the data to
estimate this line. The notation represents a sample equation that estimates the linear model.
𝑦̂ = 𝑎 + 𝑏𝑥
The sample equation 𝑦̂ = a + bx is called the prediction equation because it provides a prediction
𝑦̂ for the response variable at each value of x.
The formulas for a and b in the prediction equation 𝑦̂ = 𝑎 + 𝑏𝑥 are
∑(𝑥−𝑥̅ )(𝑦−𝑦̅)
𝑏= ∑(𝑥−𝑥̅ )2
and 𝑎 = 𝑦̅ − 𝑏𝑥̅
4
, If an observation has both x and y values above average, or both x and y values are below their
average, then (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) positive. The directional estimate b tends to be positive when most
observations are like this, i.e. when points with large x values also have large y values and points
with small x values also have small x values
have y values.
Effect of outliers on the prediction equation
Regression outlier: this is quite far removed from the trend that the rest of the data follows. This
finding seems to have a significant effect. The line seems to be drawn towards it and away from
the center of the general trend of points.
An observation is called influential if removal from it leads to a major change in the prediction
equation. Unless the sample size is large, an observation can have a strong influence on the slope
if its x-value is low or high compared to the rest of the data and if it is a regression outlier.
Prediction errors are called residues
The prediction error is the difference between the actual y-value and the predicted value.
The prediction errors are called residues . For an observation, the difference between an
observed value and the predicted value of the response variable, y - , is 𝑦̂called the residue.
A positive residue occurs when the observed value y is greater than the predicted value 𝑦̂, i.e. y-𝑦̂
> 0. A negative residue occurs when the value is less than the predicted value. The smaller the
absolute value of the residue, the better the prediction, because the predicted value is closer to
the observed value. In a scatterplot, the residue for an observation is the vertical distance between
its point and the prediction line.
Prediction equation has least squares property
We summarize the size of the residues by the sum of their squared values. This quantity, denoted
by SSE, is
𝑆𝑆𝐸 = ∑(𝑦 − 𝑦̅)2
The residue is calculated for each observation in the sample, each residue is squared, and then
SSE is the sum of these squares. The symbol SSE is an abbreviation for sum of squared errors.
This terminology refers to the residue as a measure of the prediction error when using 𝑦̂ om y to
predict. The better the prediction equation, the smaller the residues usually are and thus the
smaller the SSE. Each specific equation has corresponding residues and a value of SSE.
The least squares estimates We summarize the size of the residues by the sum of their squared
values. This quantity, denoted by SSE, is the residue shall be calculated for each observation in
the sample, each residue is squared, and then SSE is the sum of these squares. The symbol SSE
is an abbreviation for sum of squared errors. This terminology refers to the residue as a measure
of the prediction error when using 𝑦̂ om y to predict. The better the prediction equation, the smaller
the residues usually are and thus the smaller the SSE. Each specific equation has corresponding
residues and a value of SSE.
The smallest quadratic estimates a and b are the values that the prediction equation
𝑦̂ = 𝑎 + 𝑏𝑥 for which the residual sum of the squares 𝑆𝑆𝐸 = ∑(𝑦 − 𝑦̂)2 is a minimum.
The prediction line 𝑦̂ = a + bx is called the least squared line because it has the smallest sum of
squared residues.
5