Samenvatting

Summary Exam prep sheet: Statistics and Methodology

2 keer verkocht

Instelling
Tilburg University (UVT)

The file contains materials to prepare for the course exam of Statistics and Methodology (880259-M-6), the core course for the M.Sc. Data Science & Society. It includes theoretical concepts from all 10 lectures. Knowing and being able to reproduce the materials of this summary should allow for a su...

[Meer zien]

Voorbeeld 3 van de 9 pagina's

Bekijk voorbeeld

Geupload op 15 januari 2024
Aantal pagina's 9
Geschreven in 2023/2024
Type Samenvatting

Volgen

jtjurlik Lid sinds 1 jaar 16 documenten verkocht

€4,39

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Sta$s$cs and Methodology concepts and theory
Exam prepara)on M.Sc. Data Science & Society (year 2023/24)

1. Basics

Wald test

Sampling distribution = the probability distribution of a statistic.

p-value = the probability of observing the given test statistic, or one more extreme, if the null
hypothesis were true.

Statistical modeling - method of knowledge generation through the application of the statistical
analysis to real-world data. Allows to control for confounding factors. As opposed to statistical
testing which uses experimental data with random assignment.

2. Design

Data science cycle: Define -> Collect -> Process -> Clean

EDA can be used:
• to generate hypotheses for confirmatory data analysis;
• to sanity-check hypotheses.

3a. Data imputation

P items (columns) -> 2P possible response patterns (with missing data and without it)

Covariance coverage = the proportion of cases/observations/rows available to estimate a given
pairwise relationship (e.g., a covariance between two variables)

Missing data mechanisms
• MCAR -> non-response is not dependent on the data observed
o P(R|Ymis, Yobs) = P(R)
• MAR -> non-response is dependent on the data observed (e.g., by a certain cohort)
o P(R|Ymis, Yobs) = P(R|Yobs)
• MNAR -> non-response is directly defined by the data observed
o P(R|Ymis, Yobs) ≠ P(R|Yobs)
• Indirect MNAR -> certain variable correlated with non-response is not in the dataset

Missing data treatments
• Listwise deletion
o Biased parameter estimates for MAR and MNAR
o Biased (downwards) SEs
• Pairwise deletion
o Biased parameter estimates for MAR and MNAR
o Biased (downwards) SEs
• Unconditional mean substitution
o Biased parameter estimates in all scenarios
o Weakens measures of linear association
o Biased (downwards) SEs

, • Deterministic regression imputation (conditional mean subs.)
o Biased parameter estimates in all scenarios
o Inflates measures of linear association
o Biased (downwards) SEs
• Averaging available items (person-mean imputation)
o Biased parameter estimates for MAR and MNAR
o Biased parameter estimates if items do not contribute equally to the aggregate
score
• Last Observation Carried Forward
o Weakens estimates of growth
• Stochastic regression imputation
o Adds a random residual error to the imputated values to eliminate parameter bias
• Yimp = Y^mis + ε
o Biased (downwards) SEs
• Multiple imputation
o Models random residual error AND uncertainty in the regr. coefficients used to
create imputations
• A different set of coeff.s is randomly sampled to create each of the M
imputation
• Yimp = Y^mis + ε -> where Y^mis = 𝛽0 + 𝛽1Xmis is new for each M
o Eliminates parameter bias and SE bias (= accurate type I error rate)
o Biased parameter estimates for MNAR

3b. Outliers

• Int. student. residuals: an observation Xn is an outlier if Tn > c
where

• Ext. student. residuals: an observation Xn is an outlier if T(n) > c
o where:

o deletion mean and deletion SD are used,
o and (n) includes all observations bar the observation n itself.

• MAD: same logic but use TMAD:

• Tukey's boxplot method:
o A value outside of the inner fence (c = 1.5) is a possible outlier.
o A value outside of the outer fence (c = 3) is a probable outlier.

• By breakdown points (lowest first):
o Mean (int. stud. res.)
o Deletion mean (ext. stud. res)
o Tukey's boxplot
o MAD

, • Robust Mahalanobis (MCD estimation):
o "Multivariate generalization of the ext. stud. res."
o Fraction of the sample used to define center of the data determines robustness
• Fraction ↑ -> identified outliers ↓
o Cut-off determined as the sq.root of some quartile of chisq distribution
• Cutoff ↑ -> identified outliers ↓

4. Simple linear regression

• The full population model
o 𝑌 = 𝛽0 + 𝛽1𝑋 + 𝜀
• The estimated, sample model
o
• The estimated best-fit line(s)
o
• The true best-fit line
o

Sum of the squared residuals (RSS)

where

Statistical inference is needed to compute the precision with which we’ve estimated the OLS
estimators

Confidence intervals:

where tcrit = z1–α/2 for CI1–α

• => i.e. CI95 for the slope (𝛽^1) suggests that we can be 95% confident that the true value of 𝛽1
is between [_ ; _]
• => if we repeat the analysis an infinite number of times, 95% of the CIs that we calculate will
surround the true value of 𝛽1
• => CIs give us a plausible range for the population value of 𝛽

5. Multiple linear regression

Model fit:

where:

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jtjurlik. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,39. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 62774 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Summary Exam prep sheet: Statistics and Methodology

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?