100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
MAT-22306 Lectures Quantitative Research Methodology and Statistics €2,99
In winkelwagen

College aantekeningen

MAT-22306 Lectures Quantitative Research Methodology and Statistics

2 beoordelingen
 52 keer bekeken  5 keer verkocht

Extensive lecture summary of the course Quantitative Research Methodology and Statistics (MAT) at Wageningen University (WUR). Slides included as examples to give an extensive overview.

Voorbeeld 4 van de 31  pagina's

  • 9 september 2021
  • 31
  • 2021/2022
  • College aantekeningen
  • Jos hageman
  • Alle colleges
Alle documenten voor dit vak (1)

2  beoordelingen

review-writer-avatar

Door: JoostGooi • 2 jaar geleden

review-writer-avatar

Door: iwur • 2 jaar geleden

avatar-seller
Nerine
MAT22306 - Quantitative research methodology and statistics
Lecture 1.1
Data types and distributions:
Variables must be able to vary (have different values), e.g. gender (can be male/female). Male is not a variable, as it
cannot vary. Male is a level of variable.

Types of variables:
Categorical/nominal: there’s no order or magnitude. Solely distinguishes between levels.
Ordinal: distinguishes between levels, fixed order. Clear order, no clear magnitude/difference between the values.
Interval: distinguished between levels and values, with a fixed order and there’s equal distance from the differences.
Ratio: distinguished between levels and values, with a fixed order. Distances are equal, but now there’s a natural zero

Describing findings of variables:
Categorical: reporting in percentages or frequencies (56 oranges, 60 apples)
Ordinal: reporting in percentages or frequencies.
Interval: infinitely many options (infinite categories). Report in summary measures for mean, central tendency, and
width of distribution.
Ratio: infinitely many options (infinite categories). Report in summary measures for mean, central tendency, and width
of distribution.

Measures of central tendency:
How to summarize groups of people with one measure? Describe the typical/average income in group
Mode: most common occurrence. Measure of centrality
Median: middle person
Mean: what is the average?

In a normal distribution, all central tendency measures are the same.

Measures of distribution:
Shows the difference/spread in the sample, used with percentiles (%) or % ranges

Standard deviation: the average distance from the average.
Formula: sum (each individual observation – overall mean) ² / total nr of observations. So,
(squared difference between the value of an observation minus the mean).

Sum of Squares (SS): for every score you have, you calculate the difference to the mean (obs –
mean), and square it. Add all of these up. The more observations, the > the sum.

Variance: independent variation from the number of observations around mean. Formula:
Sum of squares / total number of observations.

Normal distribution notation: N(μ, σ)
Standard normal distribution (z-distribution) notation: N(0, 1). μ = 0 σ = 1. → Tabel Field p. 995-998.
Standard normal distribution: number of standard deviations
from the mean. Number: how much of the total observations
is lower than the z-value?

Rules of thumb normal distribution:
Generally, 50% is lower than the mean.
68% is between + and – 1 standard deviation. 1 SD from the
mean, means 2/3 of the sample (68%), etc

,Kurtosis: indicates the pointiness (how high the top value) is of the distribution. Three possibilities: Leptokurtic = very
high point.
Mesokurtic = normal
Platykurtic = flattened.

Lack of symmetry: skewness. Can be tricky as
the mean can no longer be used as a central
tendency value of the data.
Positive skewness = longer tail towards positive
values
Negative skewness = longer tail towards
negative values.




Checks for normal distribution/normality:
1) Histogram: does it look like a bell-shaped curve/ND?
2) Boxplot: median is given, around that box of 50% of all observations. Symmetric in box and whiskers? Whiskers
(uiteinden) should capture about 95% of the values.
3) Q-Q plot: are the predicted residuals under normality the same as the observed residuals (difference between
mean)? Ideally all residuals should be on the straight line.

Fixing non-normality:
Many real world situations have a lowest possible value of 0, e.g. income, distance, time spent on task. Then you get
a positively skewed distribution (figure above), which is called log-normal. In cases where it makes sense to think
about doubling distance or times (e.g. spending 1 or 2 secs on a task, or 1 or 2 minutes), then you can calculate the
logarithm of such a scale. Then the skewed data could transforms to a normal distribution.

Sample and population:
Population = every case of interest
Sample = part of the population, which we try to generalize to the population at large

Population estimates require random samples. Inferential statistics: making population claims based on sample.

Estimate values for population through sample:
μ: sample mean (M or 𝑥̅ ) is an estimate for population mean (μ)
σ: sample SD (s) is an estimate of population SD (σ). N-1 is a correction for small samples

Sample distribution (bell figure) will become narrower when the sample is larger. Meaning,
the larger the number of observations, the better the sample mean is an estimate of the population.

Standard error of the mean (SE): the standard deviation of the sample distribution. Larger sample, smaller SE.
Estimator formula: sample standard deviation / square root N.

,Lecture 1.2
Sample distribution: is normally distributed around the population mean, with SD called standard error (σ/√𝑛).
Standard error = the standard deviation of the sampling distribution.

When one sample is outside the e.g. 95% range, we conclude it does not belong to H0. (alpha = 0.05). Meaning, it is
unlikely that the sample was drawn from a population that had that actual population mean mu.

Significance only indicates whether there’s evidence for a difference, however small. We conclude that something
does not belong to a general population. Says little/nothing about relevance.

Transform data to a z-distribution:
(Sample mean – population mean) / standard deviation of the sampling distribution.
After getting the sample z-value, the new sample distribution follows the N(0,1).




Z-distribution


T-distribution


Estimate SE of population through SE of sample. Calculate
standard error of the sample by taking the standard
deviation and divide by square root n. The smaller the
sample, the flatter the t-distribution.

Difference in critical values: 95% z-distribution is always + - 1.96. In a t-distribution this depends on the number of
observations if that number becomes larger. → book p. 999-1000

Df (degrees freedom): number of total observation – number of parameters used to estimate situation.

T-distribution has heavier tails, a bit flatter than the ND (more probability over extreme ranges). How flat/heavy the
tails is determined by df. The t-distribution becomes standard normal (z-)distribution if df becomes infinite.

Assumptions t-distribution:
• Data is measured on interval or ratio scale
• Observations follow the normal distribution
• Based on independent observations.

The more observations (df), the steeper t gets. Especially with a
small group < 20, than the t is really different from the z.

Rule inferential statistics: we can only conclude something at a
given confidence, not 100% certain. We decide the confidence.

Type 1 and Type 2 error
Type 1 error: when in reality the null hypothesis is true, but we
reject it. Incorrectly conclude something is going on, while it’s not.

Type 2 error: something is going on, but we didn’t see it based on
sample. Beta depends on effect size, # observations, alpha (acceptance
for type 1.

Problem: The more critical on not having false positives (type 1, alpha),
the larger the chance that we miss something (type 2, beta). We want to have more compelling evidence.

, In sum:
α (alpha) = critical p-value: proportion of sample where we accept that if less than 5% of samples is beyond the point
we accept, it is probably not part of the null hypothesis.

Test statistic = calculated value (z or t). We have to find a reference point; critical t-value found with df.

Confidence interval = range in which a specific value is likely to be with given confidence. Complement of alpha: 1 – α

Rejection region= outcomes for the test statistic where we conclude H0 is not true (reject H0, support Ha). Dit is dan
buiten de 95% curve. De Rejection Region zijn de Test Statistic uitkomsten die buiten de level of significance/alpha vallen. Als je dus 0.10 en
two-sided hypothesis, heb je een rejection region van 0.10, met aan de linkerkant 0.05 en de rechterkant 0.05. One-sided: 0.10 aan die zijde.


Rejecting and accepting H0:
Outcome probability > alpha: we accept H0, Ha has not been shown
Outcome probability < (of gelijk) alpha: we reject H0, Ha has been shown

Statistical test-procedure:

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Nerine. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52355 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€2,99  5x  verkocht
  • (2)
In winkelwagen
Toegevoegd