100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten
logo-home
Statistics & Methodology Last-Minute Study Guide + Exam questions €5,48
In winkelwagen

Tentamen (uitwerkingen)

Statistics & Methodology Last-Minute Study Guide + Exam questions

 0 keer verkocht

Quick revision notes for last-minute study, which include 2-3 key questions for each lecture and 20 questions that were asked in the exam. Answers are included.

Voorbeeld 2 van de 15  pagina's

  • 17 januari 2025
  • 15
  • 2024/2025
  • Tentamen (uitwerkingen)
  • Vragen en antwoorden
Alle documenten voor dit vak (6)
avatar-seller
iuk
Statistical & Methodology
Lecture 1
Statistical reasoning: Understanding the uncertainty of our measurements.
Probability distributions: Shows all possible outcomes of a situation and how likely each outcome is.
Statistical testing: Summarizes information into a single statistic, accounting for uncertainty in
conclusions.
Test statistic: A value calculated from data that shows how far the sample statistic is from the null
hypothesis, requiring comparison with an objective reference (like a sampling distribution) to assess
its significance.
Sampling distribution: A way to understand how a statistic (like an average, or test statistic) would
vary if you took many samples from the same population. If the t-statistic lies in the tails, it’s rare and
surprising.
Distribution of a random variable: Possible values of a variable (age, gender, income, movie
preferences etc.)
Null hypothesis: Assumes no difference between the two observed groups.
Variance: A measure of how spread out the scores are. Low = Scores are very close to each other.
High = Scores are not close to each other.
P-value: A measure that helps you understand how likely your observed result is under the
assumption that the null hypothesis is true. Small p-value = < 0.05. Large p-value = > 0.05.
Steps for statistical testing:
1. Check t-statistic with sampling distribution  T-statistic large?  Check p-value.
2. Small p-value = Statistically significant  Strong evidence against the null hypothesis  Reject null
hypothesis.
3. Large p-value = Not statistically significant  No strong evidence against the null hypothesis 
CAN’T reject the null hypothesis (nil-null).
One-tailed test: Used when you have directional hypotheses (testing in one direction).
Two-tailed test: Used when you don’t have a specific direction in mind.
Statistical Modeling: A mathematical representation of their data distribution to learn the important
features. It has the ability to control for confounding factors.
Inference: Process of drawing conclusions.
Prediction: The outcome for new observations.

Q1 In the Data Science Cycle, you always need to process your data
True
Q2 Collecting own data is always preferred over secondary data
False

Lecture 2
Data Science Cycle:
1. Define the problem. 2. Collect data. 3. Prepare the data. 4. Explore the data. 5. Build models. 6.
Evaluate the model. 7. Communicate results. 8. Deploy the solution. 9. Monitor and maintain.
Research Design:
1. You must know how to operationalize the question in a statistically rigorous way.
- If possible, code the research question into a set of hypotheses.
2. You must be able to choose/build a statistical model, statistical test, or machine learning

, algorithm that can answer your well-operationalized research question.
- Problem should be supervised (you know both the inputs (features) and outputs (target values).
3. You must understand what types of data/data sources you’ll need.
- Are there prevalent proxies? (Other info you know that could fill in the empty answers that people
don’t want to answer, like their income).
- Prefer secondary data than collecting new data.
EDA: An interactive approach to analyzing and exploring data, allowing researchers to uncover
patterns, trends, and relationships without rigid hypothesis testing (data may appear similar in code
(e.g. same intercept/slope) but show differences when graphed).
CDA: To test specific hypothesis or predictions about data to validate theories OR confirm findings
from EDA.
Rules for EDA vs. CDA
1. When the data are well-understood  CDA. We don’t care about testing hypotheses?  EDA.
2. EDA can be used to generate hypotheses for CDA.
3. EDA can be used to sanity check hypotheses. (If the graph shows a linear relationship  Test
significance in CDA. If no linear relation is visible  Treat the hypothesis as rejected without
modification).

Q1 Listwise deletion will always bias the parameter estimates
False

Q2 Sort the following univariate outlier detection methods based on their breakdown points,
starting from the method with the lowest breakdown point:
(1) boxplot method
(2) externally studentized residual method
(3) internally studentized residual method
(4) median absolute deviation method
(3), (2), (1), (4)

Lecture 3
Cleaning the data:
1. The data is in an analyzable format. 2. All data contains legal values. 3. Any outliers are located and
treated. 4. Any missing data are located and treated.
Missing data: Empty cells where should be observed values. Not every empty cell is a missing data
(company stops operating or survey asking about job satisfaction to someone who is unemployed).
Missing data (response) patterns:
1. Univariate
Only missing values in one column and all the other columns are complete.
2. Monotone
Missing data starts at a certain point in the row and continues to the end. It increases as you go from
left to the right. Looks like a staircase pattern.
3. Arbitrary
Missing data is scattered randomly without any clear pattern.
Nonresponse Rates:
A measure to show how much information is missing.
1. Percent/Proportion missing.
The proportion of cells containing missing data (e.g. 10 missing responses out of 100 = 10%).
2. Attrition Rate

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper iuk. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 65040 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis
€5,48
  • (0)
In winkelwagen
Toegevoegd