Political Science Research Methods 2 (MANBPRA205A)
All documents for this subject (2)
Seller
Follow
marijejansen2
Content preview
Political Science Research Methods II
Lecture 1AB: Introduction + refresher inferential statistics
03-02-2023
The population mean is the sum of all values in the given data/population divided by a total number
of values in the given data/population.
The standard deviation is the variation between the values in the population, the variance is the
same but for in the sample.
In order to test a hypothesis about a sample you need the p-value. This can be found in the t-test table
after performing the t-test. You also need the degrees of freedom, which is n-k-1 (k is amount of
independent variables). If the p-value is higher than the common alpha/significance level of 0.05, you
cannot reject the null-hypothesis (opposite of what you want to know, so you test for the reverse. It
indicates there is no relationship between what you’re testing and that the observed difference is due
to chance). If it is lower, you reject the null-hypothesis.
The confidence interval indicates all values of a null-hypothesis that would not be rejected by the
observed sample mean → For what population means would the probability of X still be greater
than 5% (alpha = 0.05). The lower bound of the 95% confidence interval is always the sample mean
minus 1.96 times the standard error. The upper bound of the 95% confidence interval is always the
sample mean plus 1.96 times the standard error. In the example from the lecture the two values from
this meant that if the grade average is between those two values, it is still quite conceivable that you
would draw a sample of 5 exams with an average of 5.0. Based on this you could technically assume
that the average grade in the population is 3.8 and 6.2. This assumption gets made a lot in research,
people interpret as if the confidence interval is true for the entire population when it doesn’t have to
be.
The sample distribution is the distribution of the entire sample. The sampling distribution is
imaginary, imagine drawing a lot of samples. How often does the mean of the X variable occur in
these samples? The average here is the same, but the standard deviation is the standard error of the
sample distribution.
The central limit theorem is the rule that with a sample size larger than 30, the sampling distribution
will always follow the normal distribution.
,Political Science Research Methods II
Lecture 2A: Simple regression
06-02-2023
Regression analysis is about “predicting” values on a Y variable that you have already observed,
based on one or more X variables. You try to determine what the association is between these
variables. The explanatory purpose of regression analysis is that it can be informative about causal
relationships (but correlation does not imply causation!). The descriptive purpose of regression
analysis is that even without a causal relation, it is interesting in its own right to know that two things
often go together.
The simple linear regression model can be used to make a prediction based on one variable.
Υ i=( b 0+b 1 Xi)+ε i is the formula for this model. Here, Y is the dependent variable, b0 is the
intercept (constant, where the line intercepts the x-axis), b1 is the regression coefficient (how steep
the line is, with an increase of each unit of X, how much does Y increase?), X is the independent
variable value for each case, and the e is the error term (residual, without this the equation wouldn’t
really be true because not every case is on the line. The difference between the predicted Y and the
observed Y). The predicted value of Ŷ increases or decreases by the value of the regression
coefficient for every one unit increase of X.
Predicted values are values on Ŷ for each case based on the estimated model. Observed values are
values for Y for each case that we actually observe in the sample.
In order to test the significance of regression coefficients, we use the t-test.
b 1 estimated❑ −b 1 expected under H 0 b 1 estimated❑
t= = t= → because b1expected under H0
SEb1 SEb 1
= 0. With the outcome of the t-test you can look up the p-value in the t-table. The p-value is the
probability that you would have found the estimated coefficient for b1 (or larger coefficient) in your
sample if income and satisfaction with government would be completely unrelated in the population.
How to interpret SPSS-output for simple regression analysis:
You look at the ANOVA table. b0 is under the constant B, b1 is under the constant for the
independent variable, SEb1 is the standard error for the independent variable, t is the t for the
independent variable.
, Political Science Research Methods II
Lecture 2B: Ordinary least squares
10-02-2023
An estimation method is a method you/SPSS uses to ‘estimate’ the parameters of a model (b1, b0).
There are two criteria for a good estimation method:
1. It has to be unbiased: The parameters are not systematically estimated too small or too large.
2. It has to be efficient:
a. The parameter estimates are as precise as possible
b. The estimation method makes optimal use of the available information
c. The standard errors are as small as possible
d. In other words, there is a lack of error
A model isn’t inherently tied to one certain estimation method, we use ordinary least squares (OLS).
Estimation methods require assumptions to calculate parameters.
n
Regression coefficients are estimated by using the following formula: ∑ ( X i− X)(Y i−Y )
i
b 1=
❑
In the numerator we have the formula for the covariance minus the divided by n part.
In the denominator we have the formula for the variance minus the divided by n part.
So for the denominator you get every value for X and subtract the mean from each individual value,
then you square it and you have the bottom part. For the nominator you take every value for X and
subtract the mean from each individual value, and then the same for Y and then multiply both.
Intercepts are estimated using the following formula: b 0=Y −b1∗X
Standard errors are estimated using the formula: SEb 1= √ ❑
❑
You already have the bottom value from estimating the regression coefficient, you just need to square
n
root it. After this, you just need the SSR, which is SS R=∑ ¿ ¿ . For this, you take every value of Y
i
and subtract the corresponding predicted value of Y from it. Then you square it. Fill everything in and
then you have the standard error. Based on the standard error we can calculate a t-value, and then the
p-value.
As the dispersion around the regression line increases (SSR), the standard error increases (this is
logical because more variation between your predicted and observed values means the model has a
larger error).
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller marijejansen2. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.01. You're not tied to anything after your purchase.