100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten
logo-home
Summary Statistics I Notes (CIS) €6,99
In winkelwagen

Samenvatting

Summary Statistics I Notes (CIS)

 5 keer verkocht

Notes from all the lectures in English plus all codes mentioned in the lecture that might be necessary to use in the exam.

Voorbeeld 3 van de 21  pagina's

  • 30 oktober 2020
  • 21
  • 2020/2021
  • Samenvatting
Alle documenten voor dit vak (4)
avatar-seller
lamotte01
Week 2: Descriptive Statistics
Descriptive statistics vs. inferential statistics
~ Descriptive statistics:
- Statistics used to describe (sample) data without further conclusions
- Measures of central tendency: Mean, median, mode
- Measures of variation (or spread): range, IQR, variance, standard deviation
~ Inferential statistics:
- Describe data of sample in order to infer patterns in the population
- Statistical tests: t-test, χ2-test, etc.
o Sample vs. population
 Studying the whole population is (almost always) practically
impossible
 Sample is a (selected) subset of population and thus more
accessible
 Selection of representative sample is very important

(Types of) variables
~ Tabular representation of data:
- Each case is shown in a row
- Each variable is in a column
- Table
~ Nominal (categorical) scale: unordered categories
- Gender (frequently binary: two categories), Native language, etc.
~ Ordinal: ordered (ranked) scale, but amount of difference unclear
- Rank of English profiency (in class), Likert scale (Rate on a scale from 1 to
5...)
~ Interval scale: numerical with meaningful difference but no true 0
- Year of birth, temperature in Celsius
~ Ratio scale: numerical with meaningful difference and true 0
- Number of questions correct, age

Distribution of a variable
~ Normal distribution




-
- Has convenient characteristics
- Completely symmetric
- Read area: (about) 80%
- Read and green area: (about) 95%
~ Frequency of values (distribution of variables shows variability)

, - table(dat$english_grade)
~ Histogram (shows frequency of all value in groups
- hist(dat$english_grade, xlab = "English grade", main = "")
~ Density curve (shows area proportional to the relative frequency)
- plot(density(dat$english_grade), main = "", xlab = "English
grade")
- The total area under a density curve is equal to 1
- A density curve does not provide information about the frequency of one
value
o E.g., there might be no one who has scored a grade of exactly 6.1
- It only provides information about an interval
o E.g., more than 50% of the grades lie between 5.5 and 7.5
~ A distribution can also be characterized by measures of center and variation
- (skewness measures the symmetry of the distribution; not covered in this
course)

Measures of central tendency
~ Mode
- most frequent element (for nominal data: only meaningful measure)
- my_mode <- function(x) {
counts <- table(x)
names(which(counts == max(counts)))
}
my_mode(dat$english_grade)
~ Median
- when data is sorted from small to large, it is the middle value
- median(dat$english_grade)
~ Mean
- arithmetical average
- mean(dat$english_grade)



Measures of variation
~ Quantiles: cutpoints to divide the sorted data in subsets of equal size
- Quartiles: three cutpoints to divide the data in four equal-sized sets
o q1 (1st quartile): cutpoint between 1st and 2nd group
o q2 (2nd quartile): cutpoint between 2nd and 3rd group (= median!)
o q3 (3rd quartile): cutpoint between 3rd and 4th group
- Percentiles: divide data in hundred equal-sized subsets
o q1 = 25th percentile
o q2 (= median) = 50th percentile
o Score at nth percentile is better than n% of scores
- quantile(dat$english_grade)
~ Minimum, maximum: lowest and highest value
- min/max(dat$english_grade)

, ~ Range: difference between minimum and maximum
- range(dat$english_grade)
- diff(range(dat$english_grade))
~ Interquartile range (IQR): q3 - q1
- IQR(dat$english_grade)
~ box plot is used to visualize variation of a variable
- Box (IQR): q1 (bottom), median (thickest line), q3 (top)
o (In example below, q1 and median have the same value)
- Whiskers: maximum (top) and minimum (bottom) non-outlier value
- Circle(s): outliers (> 1.5 IQR distance from box)
- boxplot(dat$english_grade, col = "red")
~ Deviation: difference between mean and individual value
~ Variance: average squared deviation
- Squared in order to make negative differences positive
- Population variance (with μ = population mean):




-
- As sample mean (xˉ) is approximation of population mean (μ), sample
variance formula contains division by n−1 (results in slightly higher
variance):




-
- var(dat$english_grade)
~ standard deviation: square root of variance



-
- sd(dat$english_grade)

Standardized scores
~ Standardization helps facilitate interpretation
- E.g., how to interpret: "Emma's score is 112" and "Tom's score is 105"
~ Interpretation should be done with respect to mean μ and standard deviation σ
- Raw scores can be transformed to standardized scores (z-scores or z-
values)




-

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lamotte01. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69411 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis
€6,99  5x  verkocht
  • (0)
In winkelwagen
Toegevoegd