inferences based on comparison Answer: comparing groups X and Y.
1. decide on a numerical summary (e.g., survival rate)
2. is there a difference between the numerical summaries?
3. is this difference due to chance variation or is there a cause?
fallacy of the false cause Answer: Attributing a false cause to an inference. If A implies B (if bad
medical care implies bad outcomes), then it's not necessarily true that B has been caused by A (bad
outcomes was caused by bad medical care). There may be another cause of B.
types of data Answer: 1. (categorical) ordinal -- ordered
2. (categorical) nominal -- unordered
3. (numerical) continuous -- data that can take on any value (measured)
4. (numerical) discrete -- data that can only take certain values (rounded)
times series Answer: measure a variable over a period of time; the rows identify the times, the
columns denote what was measured (cross-sectional data measure attributes of different objects at the
same time)
describing categorical data Answer: frequency table (represents categories and the count of the
number of cases), bar chart (represents categories and counts of cases as bars where the height is equal
to the count), pareto chart (a bar chart sorted by frequency, but if the categorical variable is ordinal, you
must preserve the ordering), pie chart (represents the proportions)
, area principle Answer: the area occupied by a part of the graph should correspond to the amount of
data it represents
describing numerical data graphically Answer: histograms (create intervals and count the entries
per bin, the bars are adjacent), boxplot (shows the median, 25% quartile, 75% quartile, the width of the
box is the IQR and the whiskers have length 1.5*IQR)
mean and median on histogram Answer: mean is the balancing point of the histogram and the
median is the point at which 1/2 of the area is on the left and 1/2 is on the right
describing numerical data numerically: variation and standard deviation Answer: variation = sum of
all deviations²/(n-1)
standard deviation = sqrt(variance)
describing numerical data numerically: inter-quartile range Answer: Q3 - Q1 (75% quartile - 25%
quartile) to measure spread
empirical rule Answer: 1. 68% of the data lie within one standard deviation from the mean (μ-σ,
μ+σ)
2. 95% of the data lie within two standard deviations (μ-2σ, μ+2σ)
3. 99.7% of the data lie within three standard deviations (μ-3σ, μ+3σ)
*This is the same for area because of the area principle.
z-score Answer: measures the distance from the mean in standard units:
(x-μ)/σ
relationship between quantiles and z-scores Answer: To tell if the distribution is normal.
1. 84% quantile is a z-score of 1
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Schoolflix. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.99. You're not tied to anything after your purchase.