This document is an extensive summary of all the videos that students have to watch in order to understand the statistical concepts. This document is divded into Week 1, Week 2, Week 3 and Week 4. There is thorough explanation of the content for all the videos including some examples too.
Week 1: Video Notes:
1. Types of Variables:
➔ Categorical: Qualitative Variable: Place people into groups or categories. Get
summarized using proportions or percentages.
- Nominal: no ordering based on magnitude or size.
- Ex: Having a disease or not so the answer is either yes or no - categories, hair color)
- Ordinal: there is an ordering to the categories. The spacing between the categories does
not have a meaning.
- Ex: size of coffee you get. Place that one finishes in the race.
➔ Numeric: Quantitative variables: Recorded numerical quantities
- Discreet: Integers only. (no negative values and just full numbers) 0,1,2,3.
- Ex: number of people in the ER.
- Discrete or Continuous variables can be further subdivided into scales of
measurement looking at the ratio scale or the interval scale
- Ratio scale of Age, weight and income has a meaningful zero ratio. (the zero
means something, such as having an age of 0, it does not mean there is no age.
The ratio is meaningful.
- Temperature is measured on an Interval scale, this has a non-meaningful zero.
- Continuous: measured on a continuous scale.
- Ex: Age and weight, income
- Exceptions: Categorical variables sometimes are recorded using numbers but they are
not numeric. (Such as females being indicated as 1 and males being indicated as 2),
Another case is on the likert scale. We are using numbers to indicate categorical.
➔ Extra Notes:
- Identifiers are used to identify an individual, so numbers have no meaning here.
- Numerical values can be converted into categorical variables
- Categorical variables are recorded using numbers and this does not mean that the
numbers have a meaning.
1
, 2. Summarizing + Displaying a Categorical variable:
➔ Best way to summarize a categorical variable is to count how many people fall into each
category and then summarize that by using frequency or relative frequency.
➔ Frequency Table: might contain:
- Frequency
- Proportion: Divide the frequency by the total
- Percentages: Just take the number after the decimal of the proportion and put %.
The percentages can show us the distribution.
➔ Bar Chart: Has along the x-axis the variable and along the y-axis we can either put the
frequency, proportion or percentages.
➔ Pie Chart: for each category there is a slice of the pie with the size of the proportion.
➔ Histograms:
Histogram Properties:
1. Quantitative Data
2. No Gaps
3. Bar Width (is constant-does not change within bars) Such as the size of
something.
4. Y-axis corresponds to the frequency
Steps of building a histogram:
- Break the range of values into intervals called “classes” through finding the lowest and
highest values
- Decide the size of the bin/Bar width. 10 is a good bin size
- Bracket means Include, Parentheses means does not include
2
, 3. Measures of Central Tendency (Mean, mode, median)
➔ Sample mean: average of the numbers. (Add all numbers and divide by the number of
values.
- The sample mean is Sensitive to Outliers. When we have a huge value, the
mean can be pulled towards this huge value. Sample mean is a parametric
measure.
- The mean can be a balance of all the observations we have.
- Population mean: the mean of the entire population rather than the sample. We
abbreviate that with (meu M)
- Trimmed mean: calculating the mean after removing the lowest alpha %.
Calculating the mean after cutting the lowest 5% of the data and the highest 5% of
the data.
➔ Median: Middle value. Cuts the data in half. Order data from small to big and find the
middle value. In even number of values: add the 2 middle values and divide by 2.
- Not Sensitive to Outliers
- The Median is a NONparametric measure.
➔ Comparing mean and Median:
- When the distribution is symmetric, the mean is the same/equal as the median.
- When the distribution is skewed, the mean is pulled towards skewness.
➔ Mode: most repetitive value.
- Less commonly used.
3
, 4. Measures of Variability (Variance, SD, IQR)
➔ Measures of dispersion: how far is the data from the center (average, mean)
➔ Range: how far is the spread between the largest and the smallest number. -Larger
range means a more dispersed set. (Not used a lot)
(Biggest value - Smallest Value)
➔ Variance: sign: 𝛔2. Squared differences between each data point and the mean.
Small variance, less dispersed data set. All numbers are close to each other.
To calculate the Variance: We have to subtract EACH value we have by the mean and
square the Answer. Add the answers we get. And divide by the amount of values we
have.
- Sample Variance: Average Squared Deviation
- Sample Variance is Sensitive to Outliers
- S squared is for sample variance
- Sigma squared is for population variance
➔ Standard Deviation: 𝞼 is Square root the variance.
Difference between the SD and the variance:
- SD: measures how far apart numbers are in a data set. The higher the SD,
the more spread and far apart are data from each other.
- SD is the Average Deviation (On average how far is a data point from the
mean)
- Population Standard deviation: Sigma alone
- Variance: gives an ACTUAL value to how much the numbers in a data
sets vary from the mean.
➔ IQR: Interquartile range: Range of the Middle 50% of ordered data. Finding Median of
first half and then finding the median of the second half
IQR is NOT sensitive to outliers.
- IQR= Q3-Q1
- Q2 is the median
- If we have a data set and we are asked to find the IQR.
1. Order the data from small to big
2. Find the median of all data
3. Find the median of the first half = Q1
4. Find the median of the second half = Q2
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sajaalsaket. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $27.64. You're not tied to anything after your purchase.