100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Statistics & Methodology (880259-M-6) €5,99
In winkelwagen

Samenvatting

Summary Statistics & Methodology (880259-M-6)

 108 keer bekeken  4 keer verkocht

Detailed summary of all lectures and additional notes, explanations and examples for the course "Statistics and Methodology" at Tilburg University which is part of the Master Data Science and Society. Course was given by L.V.D.E. Vogelsmeier during the second semester, block three of the academic y...

[Meer zien]

Voorbeeld 3 van de 30  pagina's

  • 21 juni 2022
  • 30
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (5)
avatar-seller
hannahgruber
Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Statistics and Methodology (880259-M-6)
Lecturers: L.V.D.E. Vogelsmeier

,Lecture 1: Statistical Inference, Modeling and Prediction


Introduction to statistical inference


Statistical Reasoning
• consideration of uncertainty
• systematize the way we account for uncertainty when making data-based decisions
→ avid bias by ourselves: “get the result I wish to find”

Probability Distributions
• Probability distributions quantify how likely it is to observe each possible value of some
probabilistic entity “re-scaled frequency distributions”
• they show the proportion of observations that are in a certain bin, not the absolute number /
frequency of observations
• probability distributions with higher standard deviation are broader and less high

Statistical Testing
• When we conduct statistical tests, we weight the estimated effect by the precision of the
estimate.
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
• Wald Test (type of T test) 𝑇 =
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
o if there is no effect hypothesized, we assume “0”
o in general, the larger the test statistic, the better

Sampling Distribution of the test statistic
• probability distribution of a statistic
• The sampling distribution quantifies the possible values of the test statistic over infinite
repeated sampling.
• The area of a region under the curve represents the probability of observing a test statistic
within the corresponding interval.
• To quantify how exceptional our estimated test statistic is, we compare the estimated value
to a sampling distribution of t-statistics assuming no effect (null hypothesis)
o null hypothesis = no effect → “nil-null”
• If our estimated statistic would be very unusual in a population where the null hypothesis is
true, we reject the null and claim a “statistically significant” effect

Interpreting P-Values
• All that we can say is that there is a 0.032 probability (p value) of observing a test statistic at
least as large as 𝑡̂, if the null hypothesis is true.



Introduction to statistical modeling
• For simple questions we can use statistical testing to control for uncertainty. In most real-
world cases, we want to employ a modeling perspective to control for confounding variables.
• When modeling, we can make inferences about the model parameters, or we can predict
outcomes for new cases.

, Lecture 2: Research Cycle, Research Design and Exploratory Data Analysis


Discuss research/data science cycle
• CRISP-DM: The Cross-industry
Standard Process for Data
Mining was developed to
standardize the process of data
mining in industry applications
• The Data Science Cycle combines
the classical Research Cycle and
the CRISP-DM. The grey colored
activities are mandatory.



Discuss research design in data science
• In data science, we rarely design experiments/empirical studies
• Research design is still crucial to data science to design an appropriate analysis.
o You must know how to operationalize the question in a statistically rigorous way.
▪ Make sure you understand exactly what is being asked
▪ Convert each aspect of the question into something quantifiable
▪ If possible, code the research question into a set of hypotheses.
o You must be able to choose/build a statistical model, statistical test, or machine
learning algorithm that can answer your well-operationalized research question.
▪ Once you have a well-operationalized research question, you need to
convert that question into some type of model or test.
o You must understand what types of data/data sources you’ll need.



Introduce EDA (Exploratory Data Analysis)
• interactively analyze/explore your data
• More of a mindset than a specific set of techniques or steps: data driven approach to explore
something, not to test hypothesis
• diverse selection of tools to use
o Statistical graphics: Histograms, Boxplots, Scatterplots, Traceplots
o Summary graphics: measures of tendency & dispersion, order statistics
o Data Screening/Cleaning: missing data, outliers, invalid values

Interfacing EDA & CDA (Confirmatory Data Analysis)
• CDA: there is usually a clear hypothesis to test, we have some prior knowledge which we
want to test, e.g., by using hypothesis testing
• unsupervised learning models are usually more EDA because we want to find pattern
• Either can stand alone, but they play together better
o When the data are well-understood, we can proceed directly to CDA.
o If we don’t care about testing hypotheses, we can focus on EDA.
• EDA can be used to generate hypotheses for CDA.
• EDA can be used to sanity check (Plausibilitätsprüfung) hypotheses

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper hannahgruber. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,99  4x  verkocht
  • (0)
In winkelwagen
Toegevoegd