Een samenvatting van alle college slides van het vak Statistics and Methodology van de master Data Science and Society aan Tilburg University. In de samenvatting staat alles wat er op de slides stond. Daarnaast is alle uitleg woord voor woord meegetypt. De antwoorden van de wekelijkse quizzen staan...
Video 1 – Basics 1.................................................................................................................. 3
Video 2 – Basics 2.................................................................................................................. 5
Video 3 – Basics 3................................................................................................................ 10
Video 4 – Design 1 .............................................................................................................. 13
Video 5 – Design 2 .............................................................................................................. 16
Video 6 – Design 3 .............................................................................................................. 18
Video 7 – Design 4 .............................................................................................................. 19
Week 2 – 08-02-2021 t/m 14-02-2021 ........................................................................................................... 22
Video 1 – Data Cleaning 1 ................................................................................................... 22
Video 2 – Data Cleaning 2 ................................................................................................... 24
Video 3 – Data Cleaning 3 ................................................................................................... 27
Video 4 – Data Cleaning 4 ................................................................................................... 32
Video 5 – Data Cleaning 5 ................................................................................................... 35
Week 3 – 15-02-2021 t/m 28-02-2021 ........................................................................................................... 39
Video 1 – Simple Linear Regression 1 ................................................................................. 39
Video 2 – Simple Linear Regression 2 ................................................................................. 43
Video 3 – Simple Linear Regression 3 ................................................................................. 45
Video 4 – Multiple Linear Regression 1 .............................................................................. 50
Video 5 – Multiple Linear Regression 2 .............................................................................. 54
Video 6 – Multiple Linear Regression 3 .............................................................................. 59
Week 4 – 1-03-2021 t/m 7-03-2021 ............................................................................................................... 66
Video 1 – Prediction 1 ......................................................................................................... 66
Video 2 – Prediction 2 ......................................................................................................... 69
Video 3 – Prediction 3 ......................................................................................................... 73
Video 4 – Prediction 4 ......................................................................................................... 76
Video 5 – Prediction 5 ......................................................................................................... 79
1
,Week 5 – 8-03-2021 t/m 14-03-2021 ............................................................................................................. 85
Video 1 – Categorical Predictors 1 ...................................................................................... 85
Video 2 – Categorical Predictors 2 ...................................................................................... 89
Video 3 – Categorical Predictors 3 ...................................................................................... 91
Video 4 – Categorical Predictors 4 ...................................................................................... 96
Week 6 – 15-03-2021 t/m 21-03-2021 ......................................................................................................... 103
Video 1 – Moderation 1 .................................................................................................... 103
Video 2 – Moderation 2 .................................................................................................... 107
Video 3 – Moderation 3 .................................................................................................... 111
Video 4 – Moderation 4 .................................................................................................... 116
Video 5 – Moderation 5 .................................................................................................... 118
Week 7 – 22-03-2021 t/m 28-03-2021 ......................................................................................................... 124
Video 1 – Assumptions 1................................................................................................... 124
Video 2 – Assumptions 2................................................................................................... 125
Video 3 – Assumptions 3................................................................................................... 127
Video 4 – Assumptions 4................................................................................................... 132
Video 5 – Assumptions 5................................................................................................... 137
Video 6 – Assumptions 6................................................................................................... 141
1. Introduction to statistical inference
2. Introduction to statistical modeling
3. Brief mention of prediction
Motivating example
Image you are working for an F1 team. Your job is to use data from past seasons to optimize
the baseline setup of your team’s car (example the rubber compound, how do you set the
brakes). You want to pick the one with the best performance.
- Suppose you have two candidate setups that you want to compare
- For each setup, you have 100 past lap times
- How do you distill those 200 lap times into a succinct decision between the two
setups?
Suppose I tell you that the mean lap time for setup A is 118 seconds and the mean lap time
for setup B is 110 seconds. This is the only information available right now.
- Can you confidently recommend setup B?
- What caveats might you consider?
à First thing you should recognize is that setup B is better because it is faster. When you are
thinking about controlling external influences on the system instead of car characteristics. This
is a good way of thinking, that is adopting a statistical modelling perspective of thinking which
we are going to do.
à When you are thinking about modelling the system and controlling the compounds and
getting the best outcome to the question. However, there is a more fundamental issue at hand
here. We don’t have to think about the characteristics of the circuit. There is a more
fundamental point.
Suppose I tell you that the standard deviation for the times under setup A is 7 seconds and
the standard deviation for the times under setup B is 5 seconds.
- How would you incorporate this new information into your decision?
Suppose, instead, that the standard deviation of times under setup A is 35 seconds and the
standard deviation under setup B is 25 seconds.
- How would you adjust your appraisal of the setup’s relative benefits?
à You would be far more confident to recommend setup B in the first scenario than you
would be in the second scenario. Because in the first scenario the times were measured far
more precise than in the second scenario. The issue at hand here is that in the second scenario
that you cannot make a differentiation between the setups. The means may be different but
the overall scores overlap a lot. Therefore, we need to take their variability also into account.
Statistical reasoning
The preceding example calls for statistical reasoning.
3
, - The foundation of all good statistical analyses is a deliberate, careful, and thorough
consideration of uncertainty
- In the previous example, the mean lap time for setup A is clearly longer than the mean
lap time for setup B
- If the times are highly variable, with respect to the size of the mean difference, we may
not care much about the mean difference
- The purpose of statistics is to systematize (and rules) the way that we account for
uncertainty when making data-based decisions
Statistics for Data Science
Data scientists must scrutinize large numbers of data and extract useful knowledge.
- Data contain raw information (it’s not human readable)
- To convert this information into actionable knowledge, data scientists apply various
data analytic techniques
- When presenting the results of such analyses, data scientists must be careful not to
over-state their findings (how strong are the statements)
- Too much confidence in an uncertain finding could lead your employer to waste large
amounts of resources chasing data anomalies
- Statistics offers us a way to protect ourselves from ourselves (to control the
uncertainty). Statistics provides you a toolkit to protect yourselves (be honest and
report your statements transparent etc.)
Probability distributions
Before going any further, we’ll review the general concept of a probability distribution.
- Probability distributions (= a mathematical function) quantify how likely it is to observe
each possible value of some probabilistic entity
- Probability distribution are re-scaled frequency distribution
- We can build up the intuition of a probability density by beginning with a histogram
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller moeskops20. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.91. You're not tied to anything after your purchase.