100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada
logo-home
Summary Statistics I Notes (CIS) $8.16
Añadir al carrito

Resumen

Summary Statistics I Notes (CIS)

 5 veces vendidas
  • Grado
  • Institución

Notes from all the lectures in English plus all codes mentioned in the lecture that might be necessary to use in the exam.

Vista previa 3 fuera de 21  páginas

  • 30 de octubre de 2020
  • 21
  • 2020/2021
  • Resumen
avatar-seller
Week 2: Descriptive Statistics
Descriptive statistics vs. inferential statistics
~ Descriptive statistics:
- Statistics used to describe (sample) data without further conclusions
- Measures of central tendency: Mean, median, mode
- Measures of variation (or spread): range, IQR, variance, standard deviation
~ Inferential statistics:
- Describe data of sample in order to infer patterns in the population
- Statistical tests: t-test, χ2-test, etc.
o Sample vs. population
 Studying the whole population is (almost always) practically
impossible
 Sample is a (selected) subset of population and thus more
accessible
 Selection of representative sample is very important

(Types of) variables
~ Tabular representation of data:
- Each case is shown in a row
- Each variable is in a column
- Table
~ Nominal (categorical) scale: unordered categories
- Gender (frequently binary: two categories), Native language, etc.
~ Ordinal: ordered (ranked) scale, but amount of difference unclear
- Rank of English profiency (in class), Likert scale (Rate on a scale from 1 to
5...)
~ Interval scale: numerical with meaningful difference but no true 0
- Year of birth, temperature in Celsius
~ Ratio scale: numerical with meaningful difference and true 0
- Number of questions correct, age

Distribution of a variable
~ Normal distribution




-
- Has convenient characteristics
- Completely symmetric
- Read area: (about) 80%
- Read and green area: (about) 95%
~ Frequency of values (distribution of variables shows variability)

, - table(dat$english_grade)
~ Histogram (shows frequency of all value in groups
- hist(dat$english_grade, xlab = "English grade", main = "")
~ Density curve (shows area proportional to the relative frequency)
- plot(density(dat$english_grade), main = "", xlab = "English
grade")
- The total area under a density curve is equal to 1
- A density curve does not provide information about the frequency of one
value
o E.g., there might be no one who has scored a grade of exactly 6.1
- It only provides information about an interval
o E.g., more than 50% of the grades lie between 5.5 and 7.5
~ A distribution can also be characterized by measures of center and variation
- (skewness measures the symmetry of the distribution; not covered in this
course)

Measures of central tendency
~ Mode
- most frequent element (for nominal data: only meaningful measure)
- my_mode <- function(x) {
counts <- table(x)
names(which(counts == max(counts)))
}
my_mode(dat$english_grade)
~ Median
- when data is sorted from small to large, it is the middle value
- median(dat$english_grade)
~ Mean
- arithmetical average
- mean(dat$english_grade)



Measures of variation
~ Quantiles: cutpoints to divide the sorted data in subsets of equal size
- Quartiles: three cutpoints to divide the data in four equal-sized sets
o q1 (1st quartile): cutpoint between 1st and 2nd group
o q2 (2nd quartile): cutpoint between 2nd and 3rd group (= median!)
o q3 (3rd quartile): cutpoint between 3rd and 4th group
- Percentiles: divide data in hundred equal-sized subsets
o q1 = 25th percentile
o q2 (= median) = 50th percentile
o Score at nth percentile is better than n% of scores
- quantile(dat$english_grade)
~ Minimum, maximum: lowest and highest value
- min/max(dat$english_grade)

, ~ Range: difference between minimum and maximum
- range(dat$english_grade)
- diff(range(dat$english_grade))
~ Interquartile range (IQR): q3 - q1
- IQR(dat$english_grade)
~ box plot is used to visualize variation of a variable
- Box (IQR): q1 (bottom), median (thickest line), q3 (top)
o (In example below, q1 and median have the same value)
- Whiskers: maximum (top) and minimum (bottom) non-outlier value
- Circle(s): outliers (> 1.5 IQR distance from box)
- boxplot(dat$english_grade, col = "red")
~ Deviation: difference between mean and individual value
~ Variance: average squared deviation
- Squared in order to make negative differences positive
- Population variance (with μ = population mean):




-
- As sample mean (xˉ) is approximation of population mean (μ), sample
variance formula contains division by n−1 (results in slightly higher
variance):




-
- var(dat$english_grade)
~ standard deviation: square root of variance



-
- sd(dat$english_grade)

Standardized scores
~ Standardization helps facilitate interpretation
- E.g., how to interpret: "Emma's score is 112" and "Tom's score is 105"
~ Interpretation should be done with respect to mean μ and standard deviation σ
- Raw scores can be transformed to standardized scores (z-scores or z-
values)




-

Los beneficios de comprar resúmenes en Stuvia estan en línea:

Garantiza la calidad de los comentarios

Garantiza la calidad de los comentarios

Compradores de Stuvia evaluaron más de 700.000 resúmenes. Así estas seguro que compras los mejores documentos!

Compra fácil y rápido

Compra fácil y rápido

Puedes pagar rápidamente y en una vez con iDeal, tarjeta de crédito o con tu crédito de Stuvia. Sin tener que hacerte miembro.

Enfócate en lo más importante

Enfócate en lo más importante

Tus compañeros escriben los resúmenes. Por eso tienes la seguridad que tienes un resumen actual y confiable. Así llegas a la conclusión rapidamente!

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lamotte01. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $8.16. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

45,681 summaries were sold in the last 30 days

Founded in 2010, the go-to place to buy summaries for 15 years now

Empieza a vender

Vistos recientemente


$8.16  5x  vendido
  • (0)
Añadir al carrito
Añadido