100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
statistics summary €4,49   In winkelwagen

Samenvatting

statistics summary

3 beoordelingen
 79 keer bekeken  7 keer verkocht

Summary statistics 1 Professor Yfke Ongena year 2020

Laatste update van het document: 4 jaar geleden

Voorbeeld 4 van de 52  pagina's

  • Onbekend
  • 18 oktober 2020
  • 30 oktober 2020
  • 52
  • 2020/2021
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (3)

3  beoordelingen

review-writer-avatar

Door: communicatiestudent22 • 8 maanden geleden

Heel duidelijk uitgelegd en voelt als een erg compleet geheel!

review-writer-avatar

Door: vp1023 • 3 jaar geleden

review-writer-avatar

Door: ndihma • 4 jaar geleden

avatar-seller
jetsterkman
Summary statistics
EXAM 2/11/2020 15:00-17:00

All weekly steps should be enough to prepare for the exam

You will not have to use Rstudio during the exam. You do have to be able to recognize
functions from the practica

It will be an online exam you make at home.

There will be both open-ended and MC questions
25 MC questions
6 open-ended questions, not very long answers like the lab sessions


The webinar exercises are most representative for the exam (and the practice exam of
course) All calculations you need to do will be discussed in webinar exercises. So if you can do
these exercises you are well-prepared for the calculations, but the exam also includes
theoretical questions




Week 1 - introduction
With statistics you can review and analyse the results of experiments

Statistical analyses are used to understand the data, for:

- descriptive statistics: summarizing/describing the characteristics of a sample
• Describe (sample) data without drawing conclusions
• Measures of central tendency: show which values are typical, f.e. Mean, Median, Mode
• Measures of variation/dispertion(spread): show how variable the data are, f.e. range, IQR,
variance, standard deviation
f.e.: the mean speech rate of the sample of a hundred Belgian Dutch speakers
- inferential statistics: relating variables to each other and evaluating the relationships between
variables (generalising the outcome of a sample to a population).
• Using the characteristics of a sample to draw conclusions about the entire population
f.e.: the speech rate of Dutch speaking people from the Netherlands is significantly higher than
the speech rate of Dutch speaking Belgians, based on a sample of 200 people.

statistical tests to relate a sample to a population:
• Comparing two groups to each other, or one group to a fixed value
• Associating 2 variables
• The internal consistency of questions in a questionnaire

For both kinds of statistics, the data has to be variable. This means that the cases we compare have
different values.

,For example: if you want to know if how much beer a country produces depends on how much beer
is consumed in that country, it does not make sense to investigate this if the amount of beer
consumed is exactly the same in every country.

Population= a group representing all objects of interest
For example: for an investigation focussing on the speech rate of Dutch speakers in the Netherlands
versus Belgium, the population is all Dutch-speaking people in the Netherlands and Belgium.

Parameters = the values obtained from a population
f.e.: the mean speech rate of all Dutch speakers from Belgium

Sample= a value that represents a population, without having to investigate the entire population
f.e.: the speech rate of one hundred Dutch speakers from Belgium
NL ‘steekproef’

statistics =
1 the method to analyse data
2 the measurements that are obtained from a sample
f.e.: the mean speech rate of the sample of a hundred Belgian Dutch speakers
Important: sample has to be representative for the population
sampling error= the difference between the sample statistic and the population parameter.
The smaller the sampling error, the more representative the sample.

Random sampling= the best way to draw a representative sample, because everyone in the
population has an equal chance of getting selected.
Representative sampling= using a sample that represents certain characteristics of a population,
such as ethnical group or sex.
Downside: you could overlook certain variables
Convenience sampling: the least reliable way of sampling, but the most frequently used. It means
you use data that is easily accessible, but therefore also less random.

We always need 2 types of hypothesis for statistical reasoning:
1. Research hypothesis/alternative hypothesis (Ha): ‘educated guess’
there is a relationship between two measured phenomena
Directional (expecting one value to be bigger than the other, or f.e. ‘if X increases, Y decreases’)
or non-directional (just expecting a difference between the two variables; X ≠ Y)
2. Null hypothesis (H0): there is no relationship between two measured phenomena/variables
If a significant difference is found→reject H0 and accept Ha
If no significant difference is found→retain H0

Significant: probably not due to chance

p-value (probability value): shows the probability of a certain value occurring in case H0 is true.
So: how big is the chance that you record this value ‘by chance’?
The p-value says something about the chance of finding this particular result in random samples from
the population.
‘measured significance level’
If the p-value is smaller than the alpha level (p<α), H0 can be rejected

,The p-value represents the chance of a type one error
One-sided to two-sided test: p*2
Two-sided to one-sided test: p/2
So: a one-sided test is more likely to give a significant result
significance level= boundary for the chance that you reject H0, even if it is true (type I error)
represented by the α-value (alpha value) or significance level (default 0.05)
The α-level is the threshold for the p-value, below which we regard the result as significant
The smaller the significance level, the smaller the chance at a type one error and the bigger the
chance at a type two error. Because: the higher alfa, the higher p can be and still be significant.

type one error= ‘false positive’, rejecting a true H0
type two error= ‘false negative’, rejecting a true Ha
type two errors are often because of a small power (n, the sample size, is very small a.k.a. not much
individual cases were tested)




Effect size: how big the effect is, in other words: how strong the relationship/association between
two variables is
Used to quantify the difference between two variables

n increases→Effect size stays the same
n increases→p-value: lower (p-values depend on sample size and effect size)
n increases→t-value: higher (see week 4; because: a smaller df means there are less values around
the mean, and more on the outsides)
The larger the sample, the sooner you will get a significant result
The larger the effect size, the smaller the sample can be for it to be significant.

Distribution= NL ‘verdeling’. How values of a variable are distributed.
You can visualize the distribution in a graph, f.e. with the x-axis representing the value, and the y-axis
the frequency/number of occurrences.

normal distribution (normale verdeling):
• Bell-shaped
• Symmetric
• The mean, mode and median are exactly the same
• Space under the curve is 100%
• 68% of the observations is around the mean
• we can use sd’s
• Standard normal distr.: mean=0, sd=1.


one-tailed or two-tailed test:
One-tailed is used for a directional hypothesis.
f.e.: Ha = X<Y
you want to look at the left end/tail of the curve, because that is where X is smaller than Y

, the end of the curve investigated is the shaded part
Variable = a characteristic of a testobject/the individuals that you study that does not have a fixed
value. The value of a variable can be measured.
f.e.: variables of a set of words are word length and word class/gender
-univariate
-bivariate
-multivariate

Units/cases: persons or objects you are studying, who have certain characteristics
Variables: the characteristics, they can have different values
Values: the values the variables can have


Response= dependent variable: the value that changes as a result of some other parameter of
interest
Explanatory = independent variable: the variable that influences the outcome (and determines the
response value)
Example: you are investigating if word frequency influences the response time (time it takes people
to recognize the word). The explanatory variable is the word frequency, the response variable is the
response time.

Measurement levels:
Type of measurement level of a variable determines the possibilities for statistical analysis (low-
high):
• Nominal
unorderded categories
frequency table is possible
f.e. country of birth, favourite band
binary variable: variable with two levels, f.e. male/female
• Ordinal
ordered (ranked) scale, amount of difference between categories is unclear: intervals between
the scale points are not exactly the same
f.e. Likert-scale (strongly agree – agree – neither agree nor disagree – disagree – strongly
disagree) or year of birth in groups (After 1950, Between 1941 and 1950, Between 1931 and
1940, Between 1921 and 1930, In 1920 or earlier) or education levels (MBO, HBO, WO etc.)
• Interval
Numerical with meaningful difference (so the intervals between the scale points are the
same/’known distances’) but no true 0
f.e. temperature in degrees Celsius or year of birth in numbers
do not allow for multiplication
• Ratio
numerical with meaningful difference and true 0 (value of 0 has a clear meaning)
f.e.: the amount of occurrences of a certain word in a text or travel time to work
Allow for multiplication: if in text A there are ten occurrences of a word and in text B 100, you
can say that the word occurs 10 times more often in text B

Quantitative/numerical = interval and ratio
Qualitative/categorical = nominal and ordinal

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jetsterkman. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 72042 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,49  7x  verkocht
  • (3)
  Kopen