100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
Previously searched by you
Summary Comprehensive final exam review: EVERYTHING you need to know from student who got 96% in Stats 2244. Includes notes from all prep 101 sessions.$40.49
Summary Comprehensive final exam review: EVERYTHING you need to know from student who got 96% in Stats 2244. Includes notes from all prep 101 sessions.
0 purchase
Course
Stats 2244
Institution
University Of Western Ontario (UWO
)
Comprehensive final exam review: EVERYTHING you need to know from student who got 96% in Stats 2244. Includes notes from all prep 101 sessions.
,Summarizing and Exploring Data
Data Stage: collect, monitor the quality of, and conduct a preliminary exploration of the data
Does the data collection method need “tweaking” to ensure quality (monitoring)?
Are there patterns, trends, or associations apparent in the data?
Are there any outliers or missing values? If so, how will you handle them?
Selecting a Summary
How many variables do you have?
o Univariate: 1 variable
Will describe the distribution of this one variable
o Bivariate: 2 variables
o Multivariate: three or more variables
Can explore relationships between variables
What types of variables do you have?
o Explanatory / response
o Quantitative / categorical
What characteristic(s) or relationship do you want to emphasize?
o Parameter, Measures of Spread, Relationship
Measures of Spread
Measures of Spread: characterize the variability in a distribution
Range
Range = maximum – minimum
Inflated by outliers and skew
5-Number Summary
5-number summary splits a distribution into 4 quarters
Minimum, Q1, x̃, Q3, maximum
Q1 = 25th percentile
X̃ = median
o Centermost value: order the dataset smallest→largest then take the middle value
Q3 = 75th percentile
Interquartile Range (IQR): Q3-Q1
IQR = Q3 – Q1
Q3 = third quartile = 75th percentile
Q1 = first quartile = 25th percentile
IQR contains the 50% of the data surrounding the median (25% above, 25% below)
1
,Percentiles
Percentile: a value below which a particular percentage of the distribution lies
Quartiles are percentiles which divide the distribution into 4 equal size sections
o Q1 = first quartile = 25th percentile = 25% of distribution lies below this value
o Q2 = second quartile = 50th percentile = 50% of distribution lies below this value
o Q3 = third quartile = 75th percentile = 75% of distribution lies below this value
If a value is in the 90th percentile, it is in the top 10% of the distribution
Variance
Takes into account all the data we have
Sample variance
Sample variance is a statistic
The larger the s2, the more variable the data (wider the spread)
Calculates the average of the square differences from the sample mean
R automatically uses this equation to calculate variance (assumes we’re working with
sample data, not population data)
Population variance
Population variance is a parameter
The larger the σ2, the more variable the data (wider spread)
Calculates the average of the square differences from the population mean (µ)
o Takes every value in the distribution and subtracts it from the population mean
o Squares the differences (between values and mean) to get rid of the negatives
o Divides by the total number of values in the distribution (N)
Standard Deviation
Square root of the sample variance
2
, o Gets rid of the squaring and returns variance to original units
Suitable for use with distributions without extreme outliers and/or skew
o Extreme outliers can make it seem like data has a wide variation, but really just
due to outliers
Measures of Center
Measures of center: tell us the “typical” value of a distribution
Mean
Mean (average): add up all the values and divide by the total number of values
Affected by outliers
Median
Median: arrange values smallest → largest and take centermost value
50th percentile: 50% of distribution below, 50% of the distribution above
Is not affected by outliers / extreme values
Describing Shape of a Distribution
Can describe the shape of a distribution when it is represented as a histogram
o Histogram: shows frequency distribution for univariate quantitative data
All values for variable on x-axis; frequency on y-axis
Symmetry
Symmetry: the degree to which the distribution looks like a mirror image when split down the
center
Opposite of symmetric is skewed
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller oawn18. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $40.49. You're not tied to anything after your purchase.