100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
2023 Exam summary, Introduction to Statistical Analysis, Week 1-6 (CM1005) €8,49   In winkelwagen

Samenvatting

2023 Exam summary, Introduction to Statistical Analysis, Week 1-6 (CM1005)

 3 keer bekeken  0 keer verkocht

This summary includes all lecture and tutorial information of week 1-6.

Voorbeeld 3 van de 28  pagina's

  • 12 januari 2023
  • 28
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (1)
avatar-seller
SS1000
Summary Statistics

Week 1
Statistics: “The study of how we describe and make inferences from data.” (Sirkin)
Ø An inference is “a conclusion reached on the basis of evidence and reasoning.”
Ø Distinction between descriptive & inferential statistics

Different levels of statistics:
1. Univariate (one variable)
2. Bivariate (two variables)
3. Multivariate (more variables)

Descriptive vs inferential statistics: with descriptive statistics one describes only a specific
sample. Inferential statistics is about what a sample says about the whole population.




Unit of analysis: the what or who that is being studied. Also: the unit that you will be able to
draw conclusions about. Typically, all units are the same type of “thing” in a single data set.
Variable: a measured property of each of the units of analysis.

Levels of measurement
- Nominal: group classifications where no meaningful ranking is possible (e.g., religion,
country)
- Ordinal (ORDinal): There is meaningful ranking/ ordering but the distance between
categories is unknown or not equal.
- Interval: similar to ordinal because it is a ranking but the rankings are meaningful,
and the distances are equal. But: 0 does not mean anything/ means ‘lack of’
- Ratio: same as interval but zero is meaningful/ absolute zero point.

We always first need to know the level of measurement in order to know which statistical
techniques we may use for the given variables.

“A continuous variable is measured along a continuum, whereas a discrete variable is
measured in whole units or categories.”  Continuous variables have decimals, discrete do
not.

,Measures of central tendency (CT): To (univariately) describe the distribution of variables on
different levels of measurement.
- The mean (interval/ ratio): all values are added up and divided by n, which is the
number of observations in the sample



Almost the same formula for the population mean:



Characteristics of the mean:
o Changing any score will change the mean
o Adding or removing a score will change the mean (unless that score is equal
to the mean)
o Adding, substracting, multipluing, dividing each score by a given value (same
‘constant’ value) causes the mean to change accordingly
o Sum of differences from the mean is zero:


o Sum of squared differences from the mean is minimal
o Most useful for describing (more or less) normally distributed variables.
- The median (ordinal, interval/ratio): the median is the middle case when sorting all
cases based on their value. Equal amount of cases above and below the median.
Also: 50th percentile.
o The median is not as sensitive to ouliers as the mean.
o Whenever n is an even number, the median is the mean value of the two
middle cases.
o Often used for interval/ratio variables that have skewed distributions.
- The mode (nominal, ordinal, interval/ratio): the mode is the category with the
largest amount of cases.

Measures of CT and distributions: Normal distribution: the mean, median, and mode are all
the same. In a skewed distribution the line is shifter to the left or right.

, Week 2 Lecture
Measures of variability: measures of CT alone carry not enough information to adequately
describe distributions of variables, we need a second type of measures.
E.g., Group 1 has 10 people aged 20 and 10 aged 60, group 2 has 10 people aged 39 and
then aged 41. In this case, the mean does not differ. However, the dispersion/ variability
differs.

The range (ordinal, interval/ratio): distance between highest and lowest score. Is always
reported together with maximum & minimum score and is sensitive to outliers.

The interquartile range (IQR) (ordinal, interval/ratio): based on ‘quartiles’ that spit the data
into four groups of cases. IQE is based on the distance between Q1 and Q3 and insensitive
to outliers since the range describes half of the data.

The variance (interval/ ratio): based on the Sum of Squares, i.e., the squared distance from
the mean. For the calculation of the variance, it matters whether we have the sample data
or the population data (typically we have sample data).
- Variance in a sample is expressed as:



To calculate the sample variance s² of a given variable:
o For each case, we calculate the distance to the sample mean and square that
distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the sample minus one (n-1)
- Variance in a population is expressed as:



(Greek u = population mean)
To calculate the population variance σ² (sigma square) of a given variable:
o For each case, we calculate the distance to the population mean and square
that distance (removes possible minus sign)
o All those squared distances are then added up and divided by the number of
cases in the population (N)

Ø How can we interpret the value of the variance? (e.g., 4.67)
• We don’t, but: “everything is meaningful in comparison”
(i.e. when comparing variances across groups, we can make comparative
statements about more/less dispersion around the mean)
• For the purpose of interpretation, we calculate another measure of
variability: the standard deviation
Ø Why are there two different variance formulas for sample data / population data?
• We often use the sample variance as an ‘estimator’ for the population
variance (which is typically unknown)

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper SS1000. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 84866 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,49
  • (0)
  Kopen