100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting van alle colleges + aantekeningen van het vak Statistics and Methodology van de master Data Science and Society €5,48   In winkelwagen

College aantekeningen

Samenvatting van alle colleges + aantekeningen van het vak Statistics and Methodology van de master Data Science and Society

1 beoordeling
 165 keer bekeken  10 keer verkocht

Een samenvatting van alle college slides van het vak Statistics and Methodology van de master Data Science and Society aan Tilburg University. In de samenvatting staat alles wat er op de slides stond. Daarnaast is alle uitleg woord voor woord meegetypt. De antwoorden van de wekelijkse quizzen staan...

[Meer zien]

Voorbeeld 4 van de 150  pagina's

  • 16 september 2021
  • 150
  • 2020/2021
  • College aantekeningen
  • Kyle lang
  • Alle colleges
Alle documenten voor dit vak (5)

1  beoordeling

review-writer-avatar

Door: ezeerover • 2 jaar geleden

Not complete

avatar-seller
moeskops20
Statistics and Methodology
MSc Data Science and Society
Blok 3, 2021



Inhoudsopgave

Week 1 – 01-02-2021 t/m 07-02-2021 ............................................................................................................. 3

Video 1 – Basics 1.................................................................................................................. 3
Video 2 – Basics 2.................................................................................................................. 5
Video 3 – Basics 3................................................................................................................ 10
Video 4 – Design 1 .............................................................................................................. 13
Video 5 – Design 2 .............................................................................................................. 16
Video 6 – Design 3 .............................................................................................................. 18
Video 7 – Design 4 .............................................................................................................. 19

Week 2 – 08-02-2021 t/m 14-02-2021 ........................................................................................................... 22
Video 1 – Data Cleaning 1 ................................................................................................... 22
Video 2 – Data Cleaning 2 ................................................................................................... 24
Video 3 – Data Cleaning 3 ................................................................................................... 27
Video 4 – Data Cleaning 4 ................................................................................................... 32
Video 5 – Data Cleaning 5 ................................................................................................... 35

Week 3 – 15-02-2021 t/m 28-02-2021 ........................................................................................................... 39
Video 1 – Simple Linear Regression 1 ................................................................................. 39
Video 2 – Simple Linear Regression 2 ................................................................................. 43
Video 3 – Simple Linear Regression 3 ................................................................................. 45
Video 4 – Multiple Linear Regression 1 .............................................................................. 50
Video 5 – Multiple Linear Regression 2 .............................................................................. 54
Video 6 – Multiple Linear Regression 3 .............................................................................. 59

Week 4 – 1-03-2021 t/m 7-03-2021 ............................................................................................................... 66
Video 1 – Prediction 1 ......................................................................................................... 66
Video 2 – Prediction 2 ......................................................................................................... 69
Video 3 – Prediction 3 ......................................................................................................... 73
Video 4 – Prediction 4 ......................................................................................................... 76
Video 5 – Prediction 5 ......................................................................................................... 79


1

,Week 5 – 8-03-2021 t/m 14-03-2021 ............................................................................................................. 85
Video 1 – Categorical Predictors 1 ...................................................................................... 85
Video 2 – Categorical Predictors 2 ...................................................................................... 89
Video 3 – Categorical Predictors 3 ...................................................................................... 91
Video 4 – Categorical Predictors 4 ...................................................................................... 96

Week 6 – 15-03-2021 t/m 21-03-2021 ......................................................................................................... 103
Video 1 – Moderation 1 .................................................................................................... 103
Video 2 – Moderation 2 .................................................................................................... 107
Video 3 – Moderation 3 .................................................................................................... 111
Video 4 – Moderation 4 .................................................................................................... 116
Video 5 – Moderation 5 .................................................................................................... 118

Week 7 – 22-03-2021 t/m 28-03-2021 ......................................................................................................... 124
Video 1 – Assumptions 1................................................................................................... 124
Video 2 – Assumptions 2................................................................................................... 125
Video 3 – Assumptions 3................................................................................................... 127
Video 4 – Assumptions 4................................................................................................... 132
Video 5 – Assumptions 5................................................................................................... 137
Video 6 – Assumptions 6................................................................................................... 141




2

,Week 1 – 01-02-2021 t/m 07-02-2021
Video 1 – Basics 1

1. Introduction to statistical inference
2. Introduction to statistical modeling
3. Brief mention of prediction

Motivating example
Image you are working for an F1 team. Your job is to use data from past seasons to optimize
the baseline setup of your team’s car (example the rubber compound, how do you set the
brakes). You want to pick the one with the best performance.
- Suppose you have two candidate setups that you want to compare
- For each setup, you have 100 past lap times
- How do you distill those 200 lap times into a succinct decision between the two
setups?

Suppose I tell you that the mean lap time for setup A is 118 seconds and the mean lap time
for setup B is 110 seconds. This is the only information available right now.
- Can you confidently recommend setup B?
- What caveats might you consider?

à First thing you should recognize is that setup B is better because it is faster. When you are
thinking about controlling external influences on the system instead of car characteristics. This
is a good way of thinking, that is adopting a statistical modelling perspective of thinking which
we are going to do.

à When you are thinking about modelling the system and controlling the compounds and
getting the best outcome to the question. However, there is a more fundamental issue at hand
here. We don’t have to think about the characteristics of the circuit. There is a more
fundamental point.

Suppose I tell you that the standard deviation for the times under setup A is 7 seconds and
the standard deviation for the times under setup B is 5 seconds.
- How would you incorporate this new information into your decision?

Suppose, instead, that the standard deviation of times under setup A is 35 seconds and the
standard deviation under setup B is 25 seconds.
- How would you adjust your appraisal of the setup’s relative benefits?

à You would be far more confident to recommend setup B in the first scenario than you
would be in the second scenario. Because in the first scenario the times were measured far
more precise than in the second scenario. The issue at hand here is that in the second scenario
that you cannot make a differentiation between the setups. The means may be different but
the overall scores overlap a lot. Therefore, we need to take their variability also into account.

Statistical reasoning
The preceding example calls for statistical reasoning.


3

, - The foundation of all good statistical analyses is a deliberate, careful, and thorough
consideration of uncertainty
- In the previous example, the mean lap time for setup A is clearly longer than the mean
lap time for setup B
- If the times are highly variable, with respect to the size of the mean difference, we may
not care much about the mean difference
- The purpose of statistics is to systematize (and rules) the way that we account for
uncertainty when making data-based decisions

Statistics for Data Science
Data scientists must scrutinize large numbers of data and extract useful knowledge.
- Data contain raw information (it’s not human readable)
- To convert this information into actionable knowledge, data scientists apply various
data analytic techniques
- When presenting the results of such analyses, data scientists must be careful not to
over-state their findings (how strong are the statements)
- Too much confidence in an uncertain finding could lead your employer to waste large
amounts of resources chasing data anomalies
- Statistics offers us a way to protect ourselves from ourselves (to control the
uncertainty). Statistics provides you a toolkit to protect yourselves (be honest and
report your statements transparent etc.)

Probability distributions
Before going any further, we’ll review the general concept of a probability distribution.
- Probability distributions (= a mathematical function) quantify how likely it is to observe
each possible value of some probabilistic entity
- Probability distribution are re-scaled frequency distribution
- We can build up the intuition of a probability density by beginning with a histogram




4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper moeskops20. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 73918 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Laatst bekeken door jou


€5,48  10x  verkocht
  • (1)
  Kopen