Contains lecture notes, notes from readings, and notes from tutorials; All the material you need for the research method midterm (2020). Using these notes I scored a 9.7 on the midterm. Keep in mind that these notes were written according to the 2020 syllabus. Given corona these notes may not conta...
READINGS 1
Chapter 13: Correlation and regression (FDJ) 1
Chapter 17: A guide to multivariate analysis 3
INTRO-LECTURE 6
LECTURE 1 7
LECTURE 2 15
TUTORIAL 1 23
LECTURE 3 23
LECTURE 4 28
TUTORIAL 2 36
LECTURE 5 36
LECTURE 6 49
TUTORIAL 3 58
READINGS
Chapter 13: Correlation and regression (FDJ)
Correlation measures the association between two continuous variables (interval or ratio).
Regression enables us to predict the values of one variable from the value of another.
=> Correlation and regression describe statistics on two variables as opposed to one.
1. Correlation
Positive correlation is when values increase together (upward slope).
Negative correlation is when values decrease together (downward slope) (hmm).
Correlation does not mean causation!
Sometimes correlation occurs because of a confounding effect (e.g. early menarche and
high IQ only correlate because of the confounding effect of high social class).
When drawing a scatter graph:
- Be careful choosing the right scale.
- One variable is represented by Y and the other by X.
1
, => The (assumed to be) dependent variable is the Y variable.
=> The (assumed to be) independent variable is the X variable.
- When using z-scores, you should use a cross-shaped axis.
The distinction between strong and weak correlation is fairly arbitrary.
If there is no clear pattern between two variables, there is no association between them.
The Pearson product moment correlation coefficient (or just the correlation coefficient, r) has
the following formula:
Σ[(X i −X)(Y i −Y )]
2 2
=r
√ Σ(X i −X) Σ(Y i −Y )
=> Steps to calculate the correlation coefficient:
1. Calculate the means of both variables ( X and Y ).
2
2. Calculate (X i − X) for each value in variable X.
2
3. Calculate (Y i − Y ) for each value in variable Y.
4. For each case, multiply the following: (Y i − Y ) × (X i − X)
[ ]
2 2
5. Take the sums: Σ(X i − X) and Σ(Y i − Y ) and Σ (Y i − Y ) × (X i − X)
The meaning of r:
- r = + 1 => Perfect positive correlation.
- r = between 0 and 1 => Non-perfect positive correlation.
- r = 0 => There is no association.
- r = between − 1 and 0 => Non-perfect negative correlation.
- r = − 1 => Perfect negative correlation.
2
, 2. Regression
Equation of a line in statistics: Y = a + bX
=> a is the y-intercept and b is the gradient.
The line of the best fit will be that which minimises Σdi2 (the vertical distance between the
line and all the points).
To find the regression line we use the following formula:
Σ[(X i −X)(Y i −Y )]
= b and Y − bX = a
Σ(X i −X)
=> Y = a + bX
Steps for calculating the regression line:
1. Calculate a and b.
2. Calculate Y = a + bX
=> What does Y = a + bX mean?
- a is the value of Y when X = 0 (when the regression line crosses the y-axis)
(Warning: X = 0 is not always a sensible possibility).
- b is the measure of steepness of the line ( Y increases by b when we increase X by
1).
Warning: prevent extrapolation (making predictions outside the data range).
(Some general) rules for regression lines:
- if b is positive, Y increases as X increases.
- if b is positive, Y decreases as X increases.
- if b = 0 , the line is horizontal and there is no relationship between the two variables.
To see if the regression fits the data you should square r.
- r2 is the proportion of variation in Y explained by X.
- e.g. if r2 = 0.77 then 77% of the variation in variable Y is explained by
variable X.
- 1− r2 is the proportion of variation in Y not explained by X (ergo, other factors are at
play.
Chapter 17: A guide to multivariate analysis
1. Introduction
Multivariate analysis: Analysing a pattern with more than one independent variable.
=> Allows us to examine the impact of multiple explanatory factors on dependent
variables.
=> Allows us to explore whether the impact of one variable also depends upon the
level of another.
3
, OLS regression: Appropriate for interval-level dependent variables.
Logistic regression: Appropriate for categorical (especially dichotomous) dependent
variables.
2. The principles of multivariate analysis: statistical control
Most political/social phenomena are multi-causal => Most bivariate analysis will, therefore,
be partial, at best.
Establishing causation is difficult and, therefore, we aim at “robust associations” => This has
three conditions:
1. Statistical association.
2. Appropriate time ordering.
3. Elimination of other possibilities.
=> The first two can be resolved through bivariate analysis while the latter will need
the help of multivariate analysis.
Multivariate analysis allows us to compare the importance of different factors against each
other. Thereby, it provides statistical control.
We should always look for factors that could contradict our argument.
3. Specifying different types of relationship
Confounding variables are associated (!!) with the probable cause and the outcome of a
political phenomena. They have no direct causal influence and they might obscure the
variable that does. When this is obscured and people regard the “true” relationship to include
the confounding variable we call it a spurious relationship.
Omitted variable bias is the situation in which there is a failure to control a theoretical
important variable and attribute undue causal importance to other variables.
Intervening/mediating variables are variables that are affected (!!) by the explanatory
variable and affects the outcome variable => This then constitutes an indirect relationship
rather than a spurious relationship.
- e.g. class => ideology => vote (ideology is the intervening variable).
An interaction effect is when the impact of an independent variable is conditional on the
presence of another independent variable. This latter independent variable is then a
moderator variable.
- e.g. political knowledge (moderator variable) strengthens the relationship between
ideology and vote.
4. Multivariate analysis using OLS regression
We use OLS regression when the variable is either interval or ordinal (approaching a normal
distribution). It is a simple extension of the linear regression model.
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ETruelsen. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.