This document is an exhaustive summary of all the material provided in the 2020/2021 Data Science Research Methods course. It includes in-depth descriptions of theory from the books Experimental Design (Berger et al., 2018) and Mostly Harmless Econometrics (Angrist et al., 2009) as well as the theo...
Class notes and Summary of materials Data Science Research Methods (JBM020 2020-2021)
Tout pour ce livre (2)
École, étude et sujet
Technische Universiteit Eindhoven (TUE)
Data Science
Data Science Research Methods (JBM020)
Tous les documents sur ce sujet (4)
Vendeur
S'abonner
Lieve12
Avis reçus
Aperçu du contenu
Lieve Göbbels
DS Research Methods (JBM020)
Semester 2, 2020-2021
Data Science Research Methods
Scienti c Method and Experimentation 3
The scienti c method 3
Experimentation and experimental design 3
Important concepts 4
One-Factor Designs and the Analysis of Variance 5
One-Factor Designs 5
Analysis of Variance (ANOVA) 6
Sample Size Determination 8
Sample size determination 8
Normal distribution 8
Binomial distribution 9
ANOVA II - Power 11
One-way ANOVA and power 11
Effect size 11
Sample size determination 11
Multiple Comparisons 12
Multiple comparisons 12
Bonferroni correction 12
Fisher’s Least Signi cance Difference test (LSD) 12
Tukey’s Honest Signi cant Difference test (HSD) 13
Two-Factor Designs 14
Two-way ANOVA with replication 14
Two-factor with no replication and no interaction 15
Introduction to blocking 16
Full Factorial Designs 17
Full factorial designs 17
Estimating effects in 2 factor 2 level experiments 18
Three factors at two levels 19
Number and kinds of effects 19
Main effects with large interactions 19
Choosing levels of factors when measured along continuum 20
Errors of estimates in full factorial designs 20
Fractional Factorial Designs 21
Blocking in full factorial designs II 21
Fractional factorial designs 22
Analysis of fractional factorial designs 23
Response Surface Optimization 24
Response Surface Optimization 24
Optimization steps 24
Regression models 24
, Step 2: Improvement 25
Step 3: Determination (Response Surface Designs) 25
Finding the optimum using CCD or BB estimates 26
Introduction to Econometrics for Data Scientists 27
Econometrics 27
Independence and correlation 27
Regressions 27
Causality and Selection 29
Causality formalized 29
Average Treatment Effect (ATE) 29
Average Treatment effect on Treated (ATT) 29
Selection (bias) 29
Random assignment 30
Potential problems with experiments 31
IV estimation 31
Selection on Observables and Matching 32
Matching estimators 32
Some recaps 32
Selection on observables 33
Matching 33
Different methods 34
Differences-in-Differences Estimation 36
Differences-in-differences estimation 36
Implementation 36
Testing the parallel trends assumption 36
Group-speci c trends and dynamic effects 37
More pre-periods 37
Compositional changes 37
Generalization: synthetic control 37
Regression Discontinuity Design 38
Regression Discontinuity Design (RDD) 38
Sharp RDD 38
Fuzzy RDD 40
Speci cation testing 41
Quiz Questions and Solutions 42
Quiz questions and solutions 42
, Scienti c Method and Experimentation
In short:
• The scienti c method
• Experimentation and experimental design
• Important concepts
The scienti c method
There are three important goals of data science (and beyond):
1. description: provide insight into past events;
2. prediction: provide insight into a (possible) future;
3. explanation/prescription: advise on possible outcomes.
Basic elements of the scienti c method
1. formulate (research) question;
2. perform background research;
3. formulate hypothesis;
4. determine logical consequence of hypothesis;
5. collect observations (conduct experiment);
6. test truth of hypothesis by analyzing observations (statistics);
7. report results;
8. if the hypothesis is not con rmed, go back to 2.
Some of these steps can be linked to the Six Sigma’s DMAIC method (De ne, Measure, Analyze,
Improve, Control):
• 1 can be linked to the De ne phase;
• 4 can be linked to the Measure phase;
• 5 can be linked to the Analyze phase.
So, the Improve and Control phases do not have a direct link. The scienti c method is characterized
by its iterative method.
Experimentation and experimental design
An experiment is an investigation in which the researcher selects the values (levels) of one or more
input (independent) variables and observes the values of the output (dependent) variables. This has
the purpose to get insight in the relationship between dependent and independent variables which is
then often used to optimize the underlying process.
An experimental design is then the aggregation of independent variables, the set of amounts,
settings or magnitudes (levels) of each independent variable, and the combinations of these levels.
So, the core of experimental design is to answer the three-part question:
• which factors should we study?
• how should the levels of these factors vary?
• in what way should these levels be combined?
Sometimes, for examples when analysis is ex post facto (after the data is already collected),
the levels of independent variables cannot be speci ed, because they are already given. Then,
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Lieve12. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €6,49. Vous n'êtes lié à rien après votre achat.