College aantekeningen

Note and StudyGuide of Statistics for Premasters DSS

1 keer verkocht

Vak
800878-B-6

Instelling
Tilburg University (UVT)

An detailed, well-structured summary including all the course materials: class slides, final exam example questions, quiz questions posted by the professors, with clear chart and beautiful, neat layout. This is for the course "Statistics for Premasters DSS" at Tilburg University which is part o...

[Meer zien]

Laatste update van het document: 2 maanden geleden

Voorbeeld 8 van de 22 pagina's

Bekijk voorbeeld

Geupload op 5 december 2024
Bestand laatst geupdate op 19 december 2024
Aantal pagina's 22
Geschreven in 2024/2025
Type College aantekeningen
Docent(en) Eriko fukuda, sasha kenjeeva
Bevat Alle colleges

€5,96

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Statistics Pre DSS (24fall)
1

Notes & Study Guide

Green Text: code in R

Red Text: diﬀerences worth no ng

With sign: very important key points
(referring to given quiz & exam sample ques ons)

To pass or get a good score in ﬁnal exam, it is strongly
recommended to thoroughly engage with the material
and gain a deep understanding of the concepts and
terms, rather than simply memorizing key points.

Any ques on, please email to:

Version: 202412190011
By: Alice

,Content
Research Methods/Terms .................................................................................................................................................... 3

Different Plot in R Part 1 ....................................................................................................................................................... 4

Different Plot in R Part 2 ....................................................................................................................................................... 5
2

Measures of Data ..................................................................................................................................................................... 6

Population and Sample ......................................................................................................................................................... 7

Hypothesis Testing .................................................................................................................................................................. 8

Z Test / P value / Confidence Intervals ........................................................................................................................... 9

Categorical Variable and Pearson Chi-squared test............................................................................................... 10

Continuous Variable and T test ....................................................................................................................................... 11

Paired T test and One-Sided T-Test ............................................................................................................................... 12

One-Way ANOVA .................................................................................................................................................................. 13

F distribution & run ANOVA in R .................................................................................................................................... 14

Effect Size & Further test of One-Way ANOVA ........................................................................................................ 15

Assumptions of One-Way ANOVA & Factorial ANOVA ....................................................................................... 16

Two-Way / Factorial ANOVA ............................................................................................................................................ 17

Two-Way / Factorial ANOVA in R and Affect Size ................................................................................................... 18

Appendix 1：Basics of R ..................................................................................................................................................... 19

Appendix 2：Basic Operation of R ................................................................................................................................. 20

Appendix 3：Data Graphing in R .................................................................................................................................... 21

Appendix 4：Tests Function in R .................................................................................................................................... 22

, Research Methods/Terms

Types of research
Correlation Observing what naturally goes on in world without directly interfering with it.
Cross-sectional data from people at different age
=> (quasi-experimental, case study, naturalistic observation)
Experimental one or more variables is systematically manipulated to see their effect
=> (cause and effect statement, random sampling)

Type of reliability ability of measure or produce same results under same condition
Test-retest same entities + two different points in time = consistent result
Inter-rater across people = same answer 3
Parallel forms different measures for same thing, result should be same
(eg. four different bathroom scales to measure participants' weight)
Internal consistency whether measurement actually measures it

Type of validity
Internal (the extent) causal relationship of variables can draw correct conclusion
(In an experiment testing a new drug, internal validity ensures that changes in
health outcomes are due to the drug and not other factors like diet or exercise. )
External (the extent) same pattern in real life

Construct whether you are actually measures what you want to measure
Face whether a measure “looks like” it is doing what it supposed to do
( math exams has questions about arithmetic will have high face validity, while if it
has history-related questions, then low face validity )
Ecological whether set up of study = real world scenario, it often comes with practical,
actionable insight outside the research setting.
( memory study on a quiet, controlled lab will lack of ecological validity )

Confounds unmeasured variable that is interested, what threatens internal validity.
Artefacts what threatens the external validity or construct validity of results
=> (movement noise in an EEF signal)

dependent variable (DV) “to be explained/outcome
( study testing the effect of sunlight on plant growth, the plant growth, measured
in height, number of leaves, or weight. )
independent variable (IV) “to do the explaining” / predictor
(In the same study, amount of sunlight)
check Two-Way ANOVA section

SUMMARY

, Different Plot in R Part 1

What are the different types of plots?

Histograms
- identify the shape of distribution
- show skew and kurtosis

(eg. visualize the shape of the
distribution of weight for people
in a weight loss program)
4

Scatter & Line

- display the relationship

(eg. visualize the relation between
amount of sleep and
the level of grumpiness)

SUMMARY

, Different Plot in R Part 2

Box
- depicts median, IQR and range
- to detect outliers

5

Bar

- shows mean score
- error bar displays one of following:
1. confidence interval (usually 95%)
2. standard deviation
3. standard error of the mean

- to compare discrete categories,
therefore, especially for categorical
(ordinal/nominal/binary) data
(eg. visualize the relative frequency
of various ethnicities represented
at an IT company)

SUMMARY

, Measures of Data

mean central of gravity
(for interval and ratio scale data, but sensitive to extreme value)

median middle value : (𝑛 + 1)/2
(for ordinal scale data or interval and ratio scale data, less affected by outliers)

mode frequency (for nominal scale data)
range max − 𝑚𝑖𝑛
percentiles Q2 = 50% = median (Q1 = 25% | Q3 = 75%)
6
Interquartile Range IQR 𝐼𝑄𝑅 = 𝑄 − 𝑄
=> excluding extreme values/outliners ( resistant to outliers )
How to calculate outliners? < Q1-1.5 * IQR or > Q3 + 1.5 * IQR
Skew left/negative-skewed: mean < median
(-1, 1) right/positive-skewed: mean > median
(the direction of the tail)
(Negative numbers are located to
the left of zero on the number line,
and positive numbers are to right)

Kurtosis <0 too flat (platykurtic) => has fewer extreme values, fatter tails
(-2, 2) =0 normal distribution (mesokurtic)
>0 too pointy (leptokurtic) => has more extreme value, lighter tails

Deviation 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 = 𝑿𝒊 − 𝑿

Sum of squared errors (SS) 𝑺𝑺 = ∑𝑵
𝒊 𝟏(𝑿𝒊 − 𝑿)
𝟐

𝑺𝑺
Variances(s2) 𝒔𝟐 = 𝑵 𝟏
(variance is always biased, ≤ true variance)

Standard deviation(s) or (sd) 𝒔 = √𝒔𝟐
how well mean represents the data => large sd: more spread out, small sd: more central to mean
𝐒
Standard error => to quantify how reliable it is, we do that in terms of standard error
√𝐍

The purpose of descriptive statistics is to characterize the data we collected without attempting to understand a
population.

Report descriptives
mean(M), SD, sample size, description characteristics (skewness, kurtosis and SE)

, Population and Sample
Key Idea There is always a discrepancy between sample mean and popula on mean
=> test based on sample is not always reliable, may lead to wrong conclusion

Statistical Model 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒊 = (𝒎𝒐𝒅𝒆𝒍/𝒎𝒆𝒂𝒏/𝑿) + 𝒆𝒓𝒓𝒐𝒓𝒊

#Almost never known mu/µ true popula on mean
σ popula on standard devia on
∑ ( µ)
popula on variance 𝜎 =
7
unbiased estimate of variance 𝜎 = ∑ (𝑋 − 𝑋)

n sample size
𝒙 mean of sample
s standard devia on of sample

sample variance 𝑠 = ∑ (𝑥 − 𝑥̅ )
(∑ )
∑ ( ̅) ∑
unbiased sample variance 𝑠 = ( )
= ( )

R provides estimates of the population and not the sample statistics
µ es mate of popula on mean = sample mean = hypothesis popula on mean µ0

Central Limit Theorem 1. mean of sample (𝑥̅ ) = mean of the population (µ)
2. standard error (variability, SE𝑥̅ ) of sample distribution
gets smaller as the sample size (N) increases
3. the shape of the sample distribution
becomes normal as the sample size increases
=> larger samples are more reliable

SUMMARY

, Hypothesis Testing

What is the Goal of hypothesis? to rule out the chance (sample error) as a plausible explana on for the result

What are the Steps:
1. Null Hypothesis H0: a claim of no diﬀerence in the popula on (or that an eﬀect is zero)
(MUST before the experiment)

alterna ve hypothesis·Ha Actual Research Aim: H0 is false
1.1 select an α level 1. “cut oﬀ” for decision on null
normal 0.05 2. type I error (possible)
3. How certain we want to be when rejec ng a hypothesis
8
also, the threshold used for signiﬁcance

2. locate cri cal region 1. outcomes that are very unlikely to occur if null hypothesis is true
2. sample means that are not likely to occur if variable actually has no eﬀect

3. compute test sta s c a ra o: compare the obtained diﬀerences between the sample mean and the
hypothesized popula on mean with the amount of diﬀerence we would
expect without any treatment eﬀect (the standard error)

𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 − 𝒗𝒂𝒍𝒖𝒆 𝒘𝒆 𝒉𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒛𝒆
𝒕𝒆𝒔𝒕 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄 =
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓

4. whether to reject null hypothesis if: test sta s c = large value => obtained mean diﬀerence more than expected
if: large enough in cri cal region => the diﬀerence is signiﬁcant => reject the null
if: test sta s c = rela vely small => the diﬀerence is not suﬃcient => fail reject

SUMMARY

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper AliceOuterspace. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,96. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 66184 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis