College aantekeningen

Descriptive and Inferential Statistics: Lectures summary

Name: Descriptive and Inferential Statistics: Lectures summary
SKU: doc_1420700
Rating: 5.00 (1 reviews)
Author: gannkag

1 beoordeling

13 keer verkocht

Instelling
Vrije Universiteit Amsterdam (VU)

Summary of all the lectures from the course Descriptive and Inferential Statistics.

[Meer zien]

Voorbeeld 3 van de 17 pagina's

Bekijk voorbeeld

Geupload op 2 december 2021
Aantal pagina's 17
Geschreven in 2020/2021
Type College aantekeningen
Docent(en) Mariska van der horst & arjen de wit
Bevat Alle colleges

1 beoordeling

Door: ivajevtic03 • 1 jaar geleden

Volgen

gannkag Lid sinds 4 jaar 34 documenten verkocht

Inferential statistics -> Refers to methods used to draw conclusions about a population based
on data coming from a sample.

WEEK 1

Module 0: Introduction
Cases, variables and levels of measurement. Basic statistics. Exploring data.
- Variables -> Characteristics of something or someone.
- Cases -> Something or someone.
Levels of measurement:
1. Categorical variables:
- Nominal -> A nominal variable is made up of various categories that differ from each
other. There is no order, this means that it’s not possible to argue that one category is
better or worse than another. An example will be nationality. No similar intervals between
the categories.
- Ordinal -> There is a difference between categories. There is an order. However, by
looking at the order you don’t know anything about the difference between the
categories. Example would be education level. No similar intervals between the
categories.
2. Quantitative variables:
- Interval -> We have different categories and we have order. However, here there are
similar intervals between the categories. An example is age.
- Ratio -> Similar to the interval level, but has a meaningful zero point. An example would
be height.
- Quantitative variables can also be distinguished in discrete and continuous variables. A
variable is discrete if its possible categories form a set of separate numbers. For
instance, the number of goals scored by a football player. A player can score 1 or 2
goals, but not 1.22 goals. A variable is continuous if the possible values of the variable
form an interval. There is an infinite region of values. An example would be height.
Module 1: Descriptive Statistics
1.1. Describing data
- Frequency table -> Shows how the values are distributed over the cases.
- Nominal/ordinal variables -> Pie chart or a bar graph.
- Interval/ratio variable -> Histogram.
1.2. Measures of central tendency
- Mode -> The value that occurs most frequently (the most common outcome). Often used
if a variable is measured on a nominal or ordinal level.
- Median -> The middle value of your observations when arranged from the smallest to
the largest.
- Mean -> The sum of all the values divided by the number of observations.
- Generally if the distribution of data is skewed to the left, the mean is less than the
median, which is often less than the mode. If the distribution of data is skewed to the
right, the mode is often less than the median, which is less than the mean.

, - If your variable is categorical -> Mode.
- If your variable is quantitative -> Median, mean. Go for the median if you have outliers or
if the distribution is skewed and if that’s not the case go for the mean.
1.3. Measures of variance
- To describe a distribution we need more than the measures of central tendency (mode,
median, mean).
- There are two measures of variability -> The range and the interquartile range.
- Range -> The difference between the highest and the lowest value. It is easy to
understand and simple to compute, but it doesn’t give a good impression of the
variability because it only takes into account the extreme values.
- Interquartile range -> It leaves out the extreme values and it basically divides the
distribution in four equal parts (25%). IQR = Q3 - Q1. Q2 = Median. The main advantage
of IQR is that it’s not affected by outliers.
- Box plot -> Graph to describe center, variability and outliers. It shows you the maximum
value that’s not an outlier, Q3, Q2, Q1, minimum value that’s not an outlier and the
outlier. The length of the box represents the IQR. The horizontal line inside the box is the
median (Q2).

- IQR = 75 percentile - 25 percentile.
- A huge advantage of the variance and the SD is that they take into account all the
values of a variable.
- Variance -> The larger the variance, the larger the variability. This means that the values
are spread out around the mean. A disadvantage is that the metric of the variance is the
metric of the variable under analysis squared. To avoid this problem we just take the
squared root of the variance. This is called the Standard Deviation.
- Standard deviation -> It can be seen as the average distance of an observation from
the mean.
- Z- score -> Number of standard deviations removed from the mean. If we recode
original scores into z- scores it’s called standardization. This means that we replace the
original scores by standard deviations from the mean. The advantage is that we can see
whether a specific score is relatively common or exceptional.

, Module 2: Associations between variables
2.1. Associations between categorical variables
- A contingency table helps you to investigate the relationship between two ordinal or
nominal variables. Always use percentages!
- The difference between a contingency table and a frequency table is that a frequency
table always concerns only one variable.
- For quantitative variables -> scatterplot.
2.2. Associations between continuous variables
- Pearson’s R -> Tells us the direction and strength of the linear relationship between two
quantitative variables. The size of R expresses how tightly the observations are
clustered around the imaginary best- fitting straight line through the cloud of the data
points. The number is always between -1 (perfect negative) and +1 (perfect positive). 0
means that there is no correlation at all. We have to standardize the variables before
calculating it!
- No linear relation -> No Pearson’s R.
2.3. The regression line
- The difference between the observed value of a variable Y and the predicted value of
this variable with a regression is the residual.
- For intercept, look at where line is on Y-axis when X=0
- For slope, look at how much the line changes for each step higher on X. If X increases
by 1, what happens with Y?
- One of the characteristics of a regression line is that the sum of the squared residuals is
as small as possible.
2.4. Applying correlation and regression

WEEK 2

Module 3: Reliability Analysis
3.1. Introduction
- Reliability -> Consistency of the measurement. If we repeat the measurement several
times the outcome should be the same.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper gannkag. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €12,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69411 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

College aantekeningen

Descriptive and Inferential Statistics: Lectures summary

Document informatie

Onderwerpen

Geschreven voor

1 beoordeling

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

In een paar klikken geregeld

Direct to-the-point

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?