Class notes

Statistics II Lecture Notes

Name: Statistics II Lecture Notes
SKU: doc_855860
Rating: 4.00 (2 reviews)
Author: polscinotes

Rating

4.0

(2)

Sold

Pages

Uploaded on

21-10-2020

Written in

2019/2020

This document contains lecture notes from the Statistics II: Applied Quantitative Analysis course, which is mandatory for all International Relations and Organizations students.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

Andy Field Discovering Statistics Using IBM SPSS

Edition:november 2017
ISBN:9781526419521
Edition:5

Written for

Institution: Universiteit Leiden (UL)
Study: International Relations and Organizations
Course: Statistics II: Applied Quantitative Analysis

All documents for this subject (10)

Document information

Uploaded on: October 21, 2020
Number of pages: 31
Written in: 2019/2020
Type: Class notes
Professor(s): Unknown
Contains: All classes

Subjects

statistics
international relations
political science

Content preview

I. COMPARING TWO MEANS: Steps of statistical inference
1. Hypothesis
a. Null hypothesis: ∆= 0
b. Alternative hypothesis: ∆≠ 0
2. Test statistic
"
∆
a. T-test: % = " in this example %̂ = 3.45
%(∆)
#$
3. Sampling distribution of the test statistic
a. T-distribution with 11202 (+()$*(+$,( + +-.,()./ − 2 012345) degrees of freedom
4. Look up/calculate p=value for %̂ = 3.45; 67 = 11202
a. p=0.0006
5. Conclusion
a. Reject the null hypothesis at the 5% significance level (because p < 0.05)
b. Earnings are different from those who followed the training program

II. ANOVA: Comparing more than two means
• If we want to compare more than two means, we cannot use a simple t-test
• ANOVA considers the differences between groups and the differences within groups

EXAMPLE: Is there a statistically significant difference between number of TV appearances for MPs of different parties?
Figure 1. Number of TV show entries

Figure 2. Total sum of squares (990 ) | 990 = 991 + 992
6
990 = ∑7
389;<3 − <̅4)*,5 >

<̅4)*,5 = 3 + 2 + 4 + 7 + 5 + 6 + 8 + 5 + 7 = 47 ÷ 9 = 5.22

990 = (3 − 5.22)6 + (2 − 5.22)6 + (4 − 5.22)6
+(7 − 5.22)6 + (5 − 5.22)6 + (6 − 5.22)6
+(8 − 5.22)6 + (5 − 5.22)6 + (7 − 5.22)6 = 31.55

FF: = GH. II

Figure 3. Model sum of squares (991 ) - 99;$(<$$,
CDA: <̅9 = (3 + 2 + 4) ÷ 3 = 3
VVD: <̅6 = (7 + 5 + 6) ÷ 3 = 6
PvdA: <̅= = (8 + 5 + 7) ÷ 3 = 6.67
(With k for the group (here: political party) and <̅ > the mean for that group
>
6
991 = J +> ;<̅> − <̅4)*,5 >
>89
= 3(3 − 5.22)6 + 3(6 − 5.22)6 + 3(6.67 − 5.22)6 = 22.89

FF? = KK. LM

Figure 4. Residual sum of squares (992 ) - 99@3(A3,
992 = ∑(<3> − <̅> )6
= (3 − 3)6 + (2 − 3)6 + (4 − 3)6
+(7 − 6)6 + (5 − 6)6 + (6 − 6)6
+(8 − 6.67)6 + (5 − 6.67)6 + (7 − 6.67)6 = 8.67

FFB = L. NO

,991 is good to answer the question: Which part of the total sum of squares can we explain by using the group means?
992 is good to answer the question: Which part of the total sum of squares cannot be explained by using the group means?

Mean squares
• The model sum of squares (991 ) is based on the difference between 3 group means and the grand mean.
o The degrees of freedom is the number of groups minus 1 for the grand mean
991 22.89
P91 = = = 11.44
671 2
671 = 3 − 1 = 2
• The residual sum of squares (992 ) is based on the difference between each value and its group mean
o The degrees of freedom is based on the number of observations (minus the number of groups)
992 8.67
P92 = = = 1.44
672 6
672 = 9 − 3 = 6
F statistic
• The ratio between the variance explained by the model (P91 ) and the variance NOT explained by the model (P92 )
• If Q > 1, the model can explain more than what it leaves unexplained
P91 11.44
Q= = = 7.92
P92 1.44

Inference: conclusion about population
Null hypothesis: the mean of all groups is the same

We compare this score for the F-test to the F-distribution.
This distribution has two sets of degrees of freedom: 671 and 672 . Here: 2 and 6.

Critical value for a significance level (a-level) of 0.05 and 2 and 6 degrees of freedom is 5.14.

SCDEFECGH compared to SIJKLDMLN
• The observed value of F (Q.O#$)P$5 = 7.92) is greater than the correspond ding critical value (Q-)3(3-*/ = 5.14)
• Therefore, we reject the null hypothesis (null hypothesis: the mean of all groups is the same)

Reporting: There was a statistically significant difference (at the 5% level) between parties in terms of the average number of tv show entries by their
politicians, F(2, 6) = 7.92, p = 0.021.

,REGRESSION ANALYSIS
Why do we use regression for statistical inference?
• To express uncertainty about our conclusions about the relation between 2 concepts
• Assessing the strength of a relation
• Understand the population (based on a sample)
Why regression?
• What if we are not just interested in the difference between two means, but in how the mean values of a variable change as another
variable changes
• Example: Have available incomes increased in rich and poor countries, or have poor countries remained poor?

• How can we describe the strength of this association? Correlation? r = 0.961

Regression is related to correlation
• But regression can assess the impact of several independent variables on one specific dependent variable
o Not just strength of the association, but size of the effect: the expected change in Y as a result of a 1-unit change in X
• By assuming a linear association exists
• Regression can assess the null hypothesis: incomes are unrelated to incomes in the past

EXAMPLE: What is the relationship between the number of seats a party has in parliament and the number of motions it tables?

‘Line of best fit’
• Minimizing the distances between points and the line; your best guess given the data available

REGRESSION EQUATION: T = U + V<
• Intercept (constant): a; if the number of seats is 0, how many motions can we expect (according to the model)?
• Slope: b; if the number of seats increases by 1, what is the expected change in the number of motions (according to the model)?

Intercept: Slope:
• If a party has 30 seats, how many motions can we expect?
o W2%X2+5 = U + V ∗ 5ZU%5
o W2%X2+5 = 38.11 + 7.17 ∗ 5ZU%5
o \ = 38.11 + 7.17 ∗ 30 = 253.3
W2%[2+5
• We often use VQ and V9 instead of use U and V
o T3 = VQ + V9 <3
o The subscript X stands for the number of the observation,
T9 is the value of the response variable T for the first observation in the dataset,
T3 is the value of the response variable T for any observation X in the dataset.

ERROR: There are observations not on the regression line, there is error! All models are wrong

, Including error in the equation
• T3 = VQ + V9 <3 + ]3 | All models are wrong, but we make assumptions about error (e.g. it is random for all cases)
• Ε[T3 |<3 ] = VQ + V9 <3 | That’s why we work with the expected value of T3 given a value of bE

HOW DO WE DRAW THE REGRESSION LINE?
• Ordinary Least Squares: Minimizes the residual sum of squares; a residual is the difference between a data point and the regression line

• Squaring these residuals gives us squared residuals, or squares; the sum of the squared residuals is 992 = 24680.2
• The regression line is chosen in such a way that the residual sum of squares is as small as possible, least squares

Calculating the regression line
• 992 = ∑(T3 − Tc3 )6
• 992 = ∑(T3 − VQ − V9 <3 )6
• Tc3 = VQ − V9 <3 ; Tc3 refers to the predicted value of y according to the regression model

Analyze > Correlate > Bivariate > Select Options > Cross-
product deviations and covariances

eR (predicted/estimated dR ) in our example
d

∑(<3 − <̅ )(T3 − Tg) 25908
Vf9 = = = 7.17
(<3 − <̅ )6 3612
Vf9 = 7.17

eS (predicted/estimated dS ) in our example
d

h
VQ = Tg − Vf9 <̅
hQ = 199.5 − 7.17 ∗ 22.5 = 38.17
V
h
VQ = 38.17

Multiple explanatory variables: If you have more than one explanatory variable in your model,
you can still calculate the ‘least squares’, this is what SPSS is for!

Regression: Key assumptions
1. It makes sense to treat the relationship between Ε[T3 |<3 ] and the x variable as linear and additive
2. Ε[T3 |<3 ] = 0, error exists but is assumed to be random, so not relevant for estimating point-values
T3 = VQ + V9 <3 + ]3
Ε[T3 |<3 ] = VQ + V9 <3
What variables are suitable for regression?
• Dependent variable: Interval-ratio scale response variables
o Must have the same substantive meaning anywhere on the scale, e.g. profit, GDP
• Otherwise, modification is needed:
o Nominal/Ordinal scale: Logistic regression (blue/brown, agree, strongly agree)
o Count scale (non-negative integers): Poisson and negative binomial regression models; NOT in this course (war casualties)
• Explanatory variables can be of any type (with modification)
• Variable values must vary (variance cannot be zero)

$6.03

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

polscinotes

4.1

(30)

Document also available in package deal

Reviews from verified buyers

Showing all 2 reviews

cassiabuonadonna International Studies · 4 reviews

4 year ago

fierkekoolen International Relations and Organizations

4 year ago

4.0

2 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

polscinotes Universiteit Leiden

View profile

Sold

176

Member since

5 year

Number of followers

116

Documents

Last sold

6 months ago

IRO Lecture notes and Book summaries

Hi! I'm a third-year International Relations and Organizations student at Leiden University who is also part of Honors College. I sell my class notes and book summaries. I have a GPA of 8.6.

4.1

30 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller polscinotes. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $6.03. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 46201 documents were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

Statistics II Lecture Notes

Connected book

Written for

Document information

Subjects

Content preview

Document also available in package deal

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay how you prefer, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?