Statistical reasoning: systematize the way we evaluate uncertainty of data-based decisions
J Protect ourselves from overstating our findings
Statistical testing: quantify and control for uncertainty
à Output = test statistic
à Objective reference à p-value
Concepts:
Variability How spread out a dataset is
Probability distribution Re-scaled frequency distribution
- y-axis = probability density
- Area under graph = 1
Marginal/unconditional à only one variable, constant mean (0)
Conditional à two variables, distribution of y (and its mean)
depends on the value of x
Sampling distribution A mathematical function that describes all of the possible values
that a parameter can take
One kind of probability distribution
Population = possible values of the test statistic (parameter, ✘
random variable) over infinite repeated sampling
P-value Probability of observing a given test statistic in the
(frequentist) corresponding sampling distribution if H0 is true
One-sided à do not care another direction at all, Type I error
(Need to decide one-sided/two-sided before testing)
Statistical Modelling: mathematical representation describing only the important features of
a distribution à J control confounds
Inference Relationship between variables
Prediction Guess
¨ Data Science Cycle
, 1. Define Problem
Research Design: design not experiments (experimental data) but analysis (observational
data)
- Operationalize research questions (vague à analyzable)
J Statistically rigorous à can be answered in a statistical way
J Quantifiable à clear outcome variable
J A set of hypotheses (if possible)
- Designing analysis
? Supervised vs unsupervised
? Inference vs prediction à causal inference more costly than correlation
? Probabilistic answers vs binary decisions
? Extrinsic limitations (e.g. time, resources, ethical issues)
2. Data Collection
? Required variables à measured / constructed
? Sensitive data à proxies
? Rare data à preferential sampling
? Experimental data vs observational data
? Sample size à power analysis
? Secondary data source à Access? Quality? Processing required?
3. Data Processing
4. Data Cleaning à Analyzable format, legal values, outliers & missing data well-handled
Missing data = empty cells where observed values should have been there
¨ Missing data pattern à Unique combination of observed & missing items
- Size = 2P, where P = no. of variables
- No missing is also one pattern
¨ Non-response rates
Percent missing Computed for each variable
à screen out “hopeless” variables
Attrition rate For longitudinal data (monotone pattern)
Proportion of participants that drop out at one time
Percent of complete cases Useful for list-wise deletion (which is a bad method)
Covariance coverage % of cases available to examine pairwise relationship
à instances with observed values for the required variables
Fraction of missing Measure on how well we treat missing values
information
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller dstilburg202021. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.97. You're not tied to anything after your purchase.