100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Samenvatting van alle colleges + aantekeningen van het vak Statistics and Methodology van de master Data Science and Society $5.91   Add to cart

Class notes

Samenvatting van alle colleges + aantekeningen van het vak Statistics and Methodology van de master Data Science and Society

1 review
 165 views  10 purchases
  • Course
  • Institution

Een samenvatting van alle college slides van het vak Statistics and Methodology van de master Data Science and Society aan Tilburg University. In de samenvatting staat alles wat er op de slides stond. Daarnaast is alle uitleg woord voor woord meegetypt. De antwoorden van de wekelijkse quizzen staan...

[Show more]

Preview 4 out of 150  pages

  • September 16, 2021
  • 150
  • 2020/2021
  • Class notes
  • Kyle lang
  • All classes

1  review

review-writer-avatar

By: ezeerover • 2 year ago

Not complete

avatar-seller
Statistics and Methodology
MSc Data Science and Society
Blok 3, 2021



Inhoudsopgave

Week 1 – 01-02-2021 t/m 07-02-2021 ............................................................................................................. 3

Video 1 – Basics 1.................................................................................................................. 3
Video 2 – Basics 2.................................................................................................................. 5
Video 3 – Basics 3................................................................................................................ 10
Video 4 – Design 1 .............................................................................................................. 13
Video 5 – Design 2 .............................................................................................................. 16
Video 6 – Design 3 .............................................................................................................. 18
Video 7 – Design 4 .............................................................................................................. 19

Week 2 – 08-02-2021 t/m 14-02-2021 ........................................................................................................... 22
Video 1 – Data Cleaning 1 ................................................................................................... 22
Video 2 – Data Cleaning 2 ................................................................................................... 24
Video 3 – Data Cleaning 3 ................................................................................................... 27
Video 4 – Data Cleaning 4 ................................................................................................... 32
Video 5 – Data Cleaning 5 ................................................................................................... 35

Week 3 – 15-02-2021 t/m 28-02-2021 ........................................................................................................... 39
Video 1 – Simple Linear Regression 1 ................................................................................. 39
Video 2 – Simple Linear Regression 2 ................................................................................. 43
Video 3 – Simple Linear Regression 3 ................................................................................. 45
Video 4 – Multiple Linear Regression 1 .............................................................................. 50
Video 5 – Multiple Linear Regression 2 .............................................................................. 54
Video 6 – Multiple Linear Regression 3 .............................................................................. 59

Week 4 – 1-03-2021 t/m 7-03-2021 ............................................................................................................... 66
Video 1 – Prediction 1 ......................................................................................................... 66
Video 2 – Prediction 2 ......................................................................................................... 69
Video 3 – Prediction 3 ......................................................................................................... 73
Video 4 – Prediction 4 ......................................................................................................... 76
Video 5 – Prediction 5 ......................................................................................................... 79


1

,Week 5 – 8-03-2021 t/m 14-03-2021 ............................................................................................................. 85
Video 1 – Categorical Predictors 1 ...................................................................................... 85
Video 2 – Categorical Predictors 2 ...................................................................................... 89
Video 3 – Categorical Predictors 3 ...................................................................................... 91
Video 4 – Categorical Predictors 4 ...................................................................................... 96

Week 6 – 15-03-2021 t/m 21-03-2021 ......................................................................................................... 103
Video 1 – Moderation 1 .................................................................................................... 103
Video 2 – Moderation 2 .................................................................................................... 107
Video 3 – Moderation 3 .................................................................................................... 111
Video 4 – Moderation 4 .................................................................................................... 116
Video 5 – Moderation 5 .................................................................................................... 118

Week 7 – 22-03-2021 t/m 28-03-2021 ......................................................................................................... 124
Video 1 – Assumptions 1................................................................................................... 124
Video 2 – Assumptions 2................................................................................................... 125
Video 3 – Assumptions 3................................................................................................... 127
Video 4 – Assumptions 4................................................................................................... 132
Video 5 – Assumptions 5................................................................................................... 137
Video 6 – Assumptions 6................................................................................................... 141




2

,Week 1 – 01-02-2021 t/m 07-02-2021
Video 1 – Basics 1

1. Introduction to statistical inference
2. Introduction to statistical modeling
3. Brief mention of prediction

Motivating example
Image you are working for an F1 team. Your job is to use data from past seasons to optimize
the baseline setup of your team’s car (example the rubber compound, how do you set the
brakes). You want to pick the one with the best performance.
- Suppose you have two candidate setups that you want to compare
- For each setup, you have 100 past lap times
- How do you distill those 200 lap times into a succinct decision between the two
setups?

Suppose I tell you that the mean lap time for setup A is 118 seconds and the mean lap time
for setup B is 110 seconds. This is the only information available right now.
- Can you confidently recommend setup B?
- What caveats might you consider?

à First thing you should recognize is that setup B is better because it is faster. When you are
thinking about controlling external influences on the system instead of car characteristics. This
is a good way of thinking, that is adopting a statistical modelling perspective of thinking which
we are going to do.

à When you are thinking about modelling the system and controlling the compounds and
getting the best outcome to the question. However, there is a more fundamental issue at hand
here. We don’t have to think about the characteristics of the circuit. There is a more
fundamental point.

Suppose I tell you that the standard deviation for the times under setup A is 7 seconds and
the standard deviation for the times under setup B is 5 seconds.
- How would you incorporate this new information into your decision?

Suppose, instead, that the standard deviation of times under setup A is 35 seconds and the
standard deviation under setup B is 25 seconds.
- How would you adjust your appraisal of the setup’s relative benefits?

à You would be far more confident to recommend setup B in the first scenario than you
would be in the second scenario. Because in the first scenario the times were measured far
more precise than in the second scenario. The issue at hand here is that in the second scenario
that you cannot make a differentiation between the setups. The means may be different but
the overall scores overlap a lot. Therefore, we need to take their variability also into account.

Statistical reasoning
The preceding example calls for statistical reasoning.


3

, - The foundation of all good statistical analyses is a deliberate, careful, and thorough
consideration of uncertainty
- In the previous example, the mean lap time for setup A is clearly longer than the mean
lap time for setup B
- If the times are highly variable, with respect to the size of the mean difference, we may
not care much about the mean difference
- The purpose of statistics is to systematize (and rules) the way that we account for
uncertainty when making data-based decisions

Statistics for Data Science
Data scientists must scrutinize large numbers of data and extract useful knowledge.
- Data contain raw information (it’s not human readable)
- To convert this information into actionable knowledge, data scientists apply various
data analytic techniques
- When presenting the results of such analyses, data scientists must be careful not to
over-state their findings (how strong are the statements)
- Too much confidence in an uncertain finding could lead your employer to waste large
amounts of resources chasing data anomalies
- Statistics offers us a way to protect ourselves from ourselves (to control the
uncertainty). Statistics provides you a toolkit to protect yourselves (be honest and
report your statements transparent etc.)

Probability distributions
Before going any further, we’ll review the general concept of a probability distribution.
- Probability distributions (= a mathematical function) quantify how likely it is to observe
each possible value of some probabilistic entity
- Probability distribution are re-scaled frequency distribution
- We can build up the intuition of a probability density by beginning with a histogram




4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller moeskops20. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $5.91. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

73918 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$5.91  10x  sold
  • (1)
  Add to cart