In this document you will find a full overview of everything that is mentioned in the lectures and lecture slides (plus examples in italics) and other relevant materials mentioned on Canvas!
Lecture 1 - Introduction, Data Exploration, and Visualization
What you observe = True value + Sampling error + Measurement error + Statistical error
→ If any of these is messed up, results are biased and recommendations are wrong
Statistics estimate parameters
→ Statistics: characteristics of the sample
→ Parameters: characteristics of the population
Target population (voters) → Coverage error → Frame population (everyone with a telephone) →
Sample error → Sample population ( random digit) → Non-response error → Respondents (accept
the call)
Post-stratification weights: make the sample closer to the population
Non-metric scales: outcomes are categorical (labels) or directional, they can only measure the
direction of the response (yes/no)
→ Nominal scale: number serves only as label or tag for identifying or classifying objects in mutually
exclusive and collectively exhaustive categories (SNR, gender)
→ Ordinal scale: numbers are assigned to objects to indicate the relative positions of some characteristic of
objects, but not the magnitude of difference between them (brand preference ranking)
Metric (continuous) scales: not only measure the direction or classification, but the intensity as
well (strongly agree, somewhat disagree)
→ Interval scale: numbers are assigned to objects to indicate the relative positions of some characteristic of
objects with differences between objects being comparable; zero point is arbitrary (Likert scale, satisfaction
scale, perceptual constructs, temperature (Fahrenheit/Celsius)
→ Ratio scale: most precise scale; absolute zero point (weight, height, age, income, temperature (Kelvin))
In summated scales (satisfaction with purchase experience, Likert scale), more than one question
is needed to capture all facets (to reduce a measurement error).
Validity: does it measure what it’s supposed to measure
→ (Face) validity: do these coefficients make sense? (do the effect sizes and signs give
plausible model results?)
Reliability: is it stable?
→ How much do these results change if …
→ we add additional control variables to the model
→ we take away some observations (outliers)
→ we estimate the same model on a new dataset
Type I error: null is falsely accepted
Type II error: null is falsely rejected
,p-value: probability of the observed data or statistic (or more extreme) given that the null
hypothesis is true (not a good measure of evidence)
Data preparation: explore data before running any model
→ Recode missing observations (9999=missing)
→ Reverse code negatively worded questions
→ Check that variables have the correct range/are not invalid
→ Check mutual consistency (age=18, date of birth=4/30/1901)
Data visualization: explore the data, understand/make sense of the data, communicate results
Choosing the right chart type
→ Showing the composition or distribution of one variable
→ Comparing data points or variables across multiple subunits
, Lecture 2 - ANOVA
Step 1: Defining Objectives
ANOVA: testing if there are differences in the mean of a metric DV across different levels of one or
more non-metric IVs
Interval scale as it has no natural zero point, a
‘’How much do you like this ad? 1-2-3-4-5-6-7’’ →
scale from -3 to +3 wouldn’t have made a difference
ANOVA allows for more than 2 levels, a t-test doesn’t (1 IV with 2 levels)
Step 2: Designing The ANOVA
Reality
Null Reality
Decision Null 1-α β
Alternative α 1-β
p-value: probability of getting data/a statistic that is as extreme or more extreme if the null
hypothesis is true
→ If the null is true in reality, what is the chance that we see the current data (or data even further apart
from what would be expected under the null)
→ If the p-value is low, data are unlikely according to the null, and the null can be rejected (low chance of
type I error)
→ For a type I error, an error rate of 5% is typically allowed (α=0.05, reject the null if p-value < α)
→ For a type II error, an error rate of 20% is typically allowed
→ Power of a study (1 - P(Null ] Alt) is set to 0.8
→ In 80% of the cases when the null is not true, you can correctly reject it
Power depends on
→ Effect size
→ Sample size
→ α is typically fixed
Thus, for a large effect, a small sample is sufficient to find the effect, and for a small effect, you
need a large sample to find the effect.
Step 2.1: Sample Size
Inputs to determine sample size
→ Effect size
→ Desired power
→ Alpha (α)
Cohen’s f (signal-to-noise ratio) = Standard deviation of group means / Common standard
deviation = Signal / Noise (not important, only to illustrate)
→ f=0.1 is a small effect, f=0.25 is a medium effect, f=0.5 is a large effect (mostly small to medium)
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sachalena. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.27. You're not tied to anything after your purchase.