100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Statistics 2 (P_BSTATIS_2)

Beoordeling
-
Verkocht
-
Pagina's
14
Geüpload op
20-03-2025
Geschreven in
2024/2025

A concise summary of the most important content from the Statistics 2 course (P_BSTATIS_2), based on lectures and the book (see below). Alan Agresti (2018). Statistical Methods For The Social Sciences – 5th global edition. Pearson Education International.

Meer zien Lees minder
Instelling
Vak










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Gekoppeld boek

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Chapter 10 to 14
Geüpload op
20 maart 2025
Bestand laatst geupdate op
20 maart 2025
Aantal pagina's
14
Geschreven in
2024/2025
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

chapter 10 introduction to multivariate relationships
causal relationships are asymmetrical → 𝑥 causes 𝑦
- association between variables
o as 𝑥 changes, the distribution of 𝑦 should change in some way
o association does NOT imply causation
- appropriate time order
- elimination of alternative explanations
o observational studies can never prove that 1 variable is a cause of another
- anecdotal evidence is not enough to disprove causality unless it can deflate 1 of the 3
criteria
- randomized experiments are the standard for establishing causality, although this isn’t
always possible in social research

in multivariate analysis, a variable is said to be controlled when its influence is removed
- randomized experiments inherently control other variables in a probabilistic sense

statistical control: approximating an experimental type of control by grouping observations
with equal/similar values on the control variables in observational research

control variable: any variable that is held constant
lurking variable: a variable not measured in a study, but does influence the association

multivariate associations
- spurious: both 𝑥1 and 𝑦 are dependent on 𝑥2 , but their association disappears when 𝑥2
is controlled
- chain relationship: the relationship between 𝑥1 and 𝑦 exists but is indirect. 𝑥2 is an
intervening variable or mediator
- multiple causes: can either be independent or dependent (= there exists a relationship
between the causes themselves)
- suppressor: when controlling for a suppressor variable, the association between 2
variables increases
- interaction: an association has diff strengths and/or directions at diff values of the
control variable




Simpson’s paradox: the possibility that after controlling for a variable, each association has the
opposite direction as the bivariate association

confounding: when 2 explanatory variables both have effects on a response variable but are
also associated with each other
- omitted variable bias: a study neglecting to observe a confounding variable that
explains a major part of the effect


1

,chapter 9 linear regression and correlation
non-directional: 𝑥 predicts 𝑦
directional:
- pos association: higher 𝑥 predicts higher 𝑦
- neg association: higher 𝑥 predicts lower 𝑦

linear regression model: 𝑦̂ = 𝑎 + 𝑏𝑥
- predicted criterion value → 𝑦̂
- 𝑦-intercept → 𝑎
- slope → 𝑏
o pos when high 𝑥-values coincide with high 𝑦-values, and vice versa
o neg when low 𝑥-values coincide with high 𝑦-values, and vice versa
o we can’t use 𝑏 to interpret the strength of the association between 𝑥 and 𝑦
▪ 𝑏 depends on the scale

we consider 3 types of 𝑦:
- 𝑦: observed outcome value of an individual
- 𝑦̅: avg outcome value (mean of 𝑦)
- 𝑦̂: individual’s predicted outcome value based on model

least square estimation: the best straight line falling closest to all data points in the scatterplot

𝑠
Pearson’s correlation: 𝑏*= 𝑟 = (𝑠𝑥 ) 𝑏
𝑦
- interpretation: 0 < negligible < .10 ≤ small < .30 ≤ moderate < .50 ≤ large
- both 𝑟 and 𝑏* are measures of effect size

residual (𝒆): vertical distance between observed 𝑦 and predicted 𝑦̂
- 𝑒 = 𝑦 − 𝑦̂
- we can use this residual to determine how well the model performs in predicting 𝑦

total sum of squares: 𝑇𝑆𝑆 = ∑(𝑦 − 𝑦̅)2
how much variation is there in the to be
explained dependent variable
marginal variation

sum of squared errors: 𝑆𝑆𝐸 = ∑(𝑦 − 𝑦̂)2
how much variation is still unexplained
after adding the independent variable
conditional variation

regression sum of squares: 𝑅𝑆𝑆 = ∑(𝑦̂ − 𝑦̅)2
how much variation is explained by adding
the independent variable

the smaller the 𝑆𝑆𝐸, the better the prediction → 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆

we use diff sum of squares to inspect the explanatory power of the model and for significance

2

, coefficient of determination (𝑹𝟐 ): proportion of variation in 𝑦 that is explained by the model
𝑇𝑆𝑆−𝑆𝑆𝐸 ∑(𝑦−𝑦̅)2 −∑(𝑦−𝑦̂)2
- 𝑅2 = 𝑇𝑆𝑆
= ∑(𝑦−𝑦̅)2
- 0≤𝑅 ≤1 2

- the closer to 1, the stronger the linear relationship
- interpretation: 0 < negligible < .02 ≤ small < .13 ≤ moderate < .26 ≤ large

inferential statistics: using sample data to make inferences abt the population parameters
- we can’t confirm hypotheses, but we can falsify
o by inspecting the probability of finding 𝑏 (or 𝑟) when the null hypothesis was true
o null hypothesis: no association between variables (independent)
▪ 𝐻0: 𝛽 = 0
o alternative hypothesis: association between variables (dependent)
▪ 𝐻𝑎: 𝛽 ≠ 0
▪ if directional: 𝛽 < 0 or 𝛽 > 0
- check significance of 𝑏 using 𝑡-statistic
𝑏
o 𝐻0: 𝛽 = 0 𝑡 = 𝑠𝑒 with 𝑑𝑓 = 𝑛 − 2
- check significance of 𝑅2 using the 𝐹-statistic
𝑅 2 /1 (𝑇𝑆𝑆−𝑆𝑆𝐸)/1 𝑅𝑆𝑆/1 𝑀𝑆𝑅
o 𝐹 = (1−𝑅2)/(𝑛−2) = 𝑆𝑆𝐸/(𝑛−2)
= 𝑆𝑆𝐸/(𝑛−2) = 𝑀𝑆𝐸
▪ 𝑑𝑓1 = 𝑘 = 1
𝑘 = number of regression parameters 𝑏
▪ 𝑑𝑓2 = 𝑛 − 𝑘 − 1 = 𝑛 − 2
- based on the 𝑡- or 𝐹-statistic, determine the 𝑝-value:
o what is the probability of finding a result this extreme, when the 𝐻0 is true?
- 𝐹 = 𝑡 2 → both options yield the same conclusion

4 scenarios are possible, depending on the decision and the condition of 𝐻0
- 2x erroneous decision (which we want to avoid)
o type 1 error: probability of rejecting 𝐻0 when it is true
▪ determined by the selected 𝛼-level (.05)
▪ if observed 𝑝-value < 𝛼 : reject 𝐻0
o type 2 error (𝛽): probability of not rejecting 𝐻0 when it is false
▪ determined by:
• strength of association/diff in population
• sample size of study
• selected 𝛼-level
o trade-off: the smaller the type 1 error, the larger the type 2 error
- 2x correct decision
o 1 − 𝛽 = power → probability of correctly rejecting 𝐻0
▪ typically aim for 80%

assumptions of linear regression:
- representativeness: analyses are based on a random sample
- functional form: relation between 𝑥 and 𝑦 is linear
- homoscedasticity: conditional variance around 𝑏 is equal for all 𝑥
- normal distribution: conditional variance of 𝑦 for all 𝑥 is normal




3
$7.67
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
d511

Maak kennis met de verkoper

Seller avatar
d511 Vrije Universiteit Amsterdam
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
9
Lid sinds
2 jaar
Aantal volgers
1
Documenten
7
Laatst verkocht
2 weken geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen