This summary based on the classic textbook for teaching statistics 'Introduction to the Practice of Statistics', helps students to correctly produce and interpret data found in a real-world context. The summary can be seen as a guide through the different types of data gathering and the analysis. U...
CHAPTER 1 - looking at Data Distributions
Terms:
- cases: objects described by a set of data → usually people in global health, could also
be villages, tractors etc.
- variable: a characteristic of a case → e.g. height
- value: different cases have different values of a variable → the height in cm
- label (unique ID): used to distinguish or uniquely identify cases with the dataset →
e.g. gender
- the key characteristics of a data set answer the questions: who, what and why?
Examining distributions:
- overall pattern:
● shape: e.g. normally distributed
● center
● spread
- deviations
- symmetry - skewed to the left / skewed to the right
● In statistics, a negatively skewed (also known as left-skewed) distribution is a
type of distribution in which more values are concentrated on the right side
(tail) of the distribution graph while the left tail of the distribution graph is
longer.
Measuring center:
1. The mean
- symbolized by x̄
- sensitive to outliers and skew
2. The median
- represented by M
- midpoint of a distribution
● half of the observations are smaller, the other half larger
- resistant to outliers and skew
- two numbers in the middle → take the average: e.g. 3,4 → M = 3.5
,Measuring spread: the quartiles
● works with the median (not the mean)
● splitting data into quartiles means splitting into 4 parts
● the median split the data into 2
● IQR (interquartile range)= Q3-Q1
● 1.5 x IQR rule for identifying outliers → anything greater than Q3 (or smaller than
Q1) + outcome of (1.5xIQR) is an outlier
- Multiplying the interquartile range (IQR) by 1.5 will give us a way to
determine whether a certain value is an outlier. If we subtract 1.5 x IQR from
the first quartile, any data values that are less than this number are considered
outliers.
● Order: minimum - quartile 1 - median/quartile 2 - quartile 3 - maximum
Boxplots
Measuring spread: the standard deviation
- works with the mean (not the median)
- symbolized by Sx
- average distance of the observations from the mean
1
,Choosing measures of center and spread:
NOTE: The median and IQR are usually better than the mean and standard deviation for
describing a skewed distribution or a distribution with outliers.
→ use mean and standard deviation only for reasonably symmetric distributions that
do not have outliers
Models
A model: a simplified representation of something more complex that helps us to understand
something
1. density curve:
- smooth curve drawn over the distribution
- it is a model of the distribution
- it is a model of what value the variable takes and how often
- if a smooth curve is always above the x-axis and the total mass/area/volume
under the curve is scaled to 1, it is a density curve
2
, Area under the curve:
● total area under a density curve is 1
● EXAMPLE: proportion of the density curve that is shaded (from 6 and <) is equal to
0.293 in a model showing the vocabulary score of 947 seventh graders → how to
interpret? About 29.3% of the vocabulary scores of the 947 seventh graders is below a
6.
Greek letters
● When mean and standard deviation come from a model of the data, Greek letters are
used:
Normal density curve:
- mathematical model for normally distributed data
- symmetric, single-peaked, and bell-shaped
- completely described by two numbers: u (mean) and 𝜎 (standard deviation)
- N (u,𝜎)
The 68-95-99.7 rule
In the Normal distribution with mean u and standard deviation 𝜎:
- approximately 68% of the observations fall within 1𝜎 of u
- approximately 95% of the observations fall within 2𝜎 of u
- approximately 99.7% of the observation fall within 3𝜎 of u
Standard normal distribution
● N (0,1)
● Simply easier to work with
● All normal distributions can be transformed (standardized) to N (0,1) (mean, SD))
--> standard normal probability/ standardized value of x/ z-score
3
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Myrtevdbergh. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €20,49. Je zit daarna nergens aan vast.