Contains lecture notes, notes from readings, and notes from tutorials; All the material you need for the research method midterm (2020). Using these notes I scored a 9.7 on the midterm. Keep in mind that these notes were written according to the 2020 syllabus. Given corona these notes may not conta...
READINGS 1
Chapter 13: Correlation and regression (FDJ) 1
Chapter 17: A guide to multivariate analysis 3
INTRO-LECTURE 6
LECTURE 1 7
LECTURE 2 15
TUTORIAL 1 23
LECTURE 3 23
LECTURE 4 28
TUTORIAL 2 36
LECTURE 5 36
LECTURE 6 49
TUTORIAL 3 58
READINGS
Chapter 13: Correlation and regression (FDJ)
Correlation measures the association between two continuous variables (interval or ratio).
Regression enables us to predict the values of one variable from the value of another.
=> Correlation and regression describe statistics on two variables as opposed to one.
1. Correlation
Positive correlation is when values increase together (upward slope).
Negative correlation is when values decrease together (downward slope) (hmm).
Correlation does not mean causation!
Sometimes correlation occurs because of a confounding effect (e.g. early menarche and
high IQ only correlate because of the confounding effect of high social class).
When drawing a scatter graph:
- Be careful choosing the right scale.
- One variable is represented by Y and the other by X.
1
, => The (assumed to be) dependent variable is the Y variable.
=> The (assumed to be) independent variable is the X variable.
- When using z-scores, you should use a cross-shaped axis.
The distinction between strong and weak correlation is fairly arbitrary.
If there is no clear pattern between two variables, there is no association between them.
The Pearson product moment correlation coefficient (or just the correlation coefficient, r) has
the following formula:
Σ[(X i −X)(Y i −Y )]
2 2
=r
√ Σ(X i −X) Σ(Y i −Y )
=> Steps to calculate the correlation coefficient:
1. Calculate the means of both variables ( X and Y ).
2
2. Calculate (X i − X) for each value in variable X.
2
3. Calculate (Y i − Y ) for each value in variable Y.
4. For each case, multiply the following: (Y i − Y ) × (X i − X)
[ ]
2 2
5. Take the sums: Σ(X i − X) and Σ(Y i − Y ) and Σ (Y i − Y ) × (X i − X)
The meaning of r:
- r = + 1 => Perfect positive correlation.
- r = between 0 and 1 => Non-perfect positive correlation.
- r = 0 => There is no association.
- r = between − 1 and 0 => Non-perfect negative correlation.
- r = − 1 => Perfect negative correlation.
2
, 2. Regression
Equation of a line in statistics: Y = a + bX
=> a is the y-intercept and b is the gradient.
The line of the best fit will be that which minimises Σdi2 (the vertical distance between the
line and all the points).
To find the regression line we use the following formula:
Σ[(X i −X)(Y i −Y )]
= b and Y − bX = a
Σ(X i −X)
=> Y = a + bX
Steps for calculating the regression line:
1. Calculate a and b.
2. Calculate Y = a + bX
=> What does Y = a + bX mean?
- a is the value of Y when X = 0 (when the regression line crosses the y-axis)
(Warning: X = 0 is not always a sensible possibility).
- b is the measure of steepness of the line ( Y increases by b when we increase X by
1).
Warning: prevent extrapolation (making predictions outside the data range).
(Some general) rules for regression lines:
- if b is positive, Y increases as X increases.
- if b is positive, Y decreases as X increases.
- if b = 0 , the line is horizontal and there is no relationship between the two variables.
To see if the regression fits the data you should square r.
- r2 is the proportion of variation in Y explained by X.
- e.g. if r2 = 0.77 then 77% of the variation in variable Y is explained by
variable X.
- 1− r2 is the proportion of variation in Y not explained by X (ergo, other factors are at
play.
Chapter 17: A guide to multivariate analysis
1. Introduction
Multivariate analysis: Analysing a pattern with more than one independent variable.
=> Allows us to examine the impact of multiple explanatory factors on dependent
variables.
=> Allows us to explore whether the impact of one variable also depends upon the
level of another.
3
, OLS regression: Appropriate for interval-level dependent variables.
Logistic regression: Appropriate for categorical (especially dichotomous) dependent
variables.
2. The principles of multivariate analysis: statistical control
Most political/social phenomena are multi-causal => Most bivariate analysis will, therefore,
be partial, at best.
Establishing causation is difficult and, therefore, we aim at “robust associations” => This has
three conditions:
1. Statistical association.
2. Appropriate time ordering.
3. Elimination of other possibilities.
=> The first two can be resolved through bivariate analysis while the latter will need
the help of multivariate analysis.
Multivariate analysis allows us to compare the importance of different factors against each
other. Thereby, it provides statistical control.
We should always look for factors that could contradict our argument.
3. Specifying different types of relationship
Confounding variables are associated (!!) with the probable cause and the outcome of a
political phenomena. They have no direct causal influence and they might obscure the
variable that does. When this is obscured and people regard the “true” relationship to include
the confounding variable we call it a spurious relationship.
Omitted variable bias is the situation in which there is a failure to control a theoretical
important variable and attribute undue causal importance to other variables.
Intervening/mediating variables are variables that are affected (!!) by the explanatory
variable and affects the outcome variable => This then constitutes an indirect relationship
rather than a spurious relationship.
- e.g. class => ideology => vote (ideology is the intervening variable).
An interaction effect is when the impact of an independent variable is conditional on the
presence of another independent variable. This latter independent variable is then a
moderator variable.
- e.g. political knowledge (moderator variable) strengthens the relationship between
ideology and vote.
4. Multivariate analysis using OLS regression
We use OLS regression when the variable is either interval or ordinal (approaching a normal
distribution). It is a simple extension of the linear regression model.
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ETruelsen. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.62. You're not tied to anything after your purchase.