Bivariate => e.g. male and female differ in grades
gender grade
Multivariate => e.g. grade dependent on X, Y, Z
X
Y grade
Z
Statistics => “the study of how we describe and make inferences from data” (Sirkin)
Inference => a conclusion reached on the basis of evidence and reasoning
Descriptive statistics => describe (sample) data
Inferential statistics => make statements about population based on sample
Population ( N ) and sample (n)
Units of analysis => what or who being studied (rows in SPSS)
Variable => measured property of each of the units of analysis (columns in SPSS)
Measurement level
Nominal Can’t be ranked Hair colour
Qualitative
variables
Can be rank-ordered
Ordinal Likert scale
NOIR
(but no equal distances)
Quantitativ
e variables
Interval Ranked with equal distances IQ
Ratio With meaningful zero Age
Continuous variable => measured along a continuum, can have decimals. E.g. height of
students in class.
CM1005 Introduction to Statistical Analysis
,Discrete variable => measured in whole units or categories. E.g. number of students in class.
Measures of central tendency => to (univariately) describe the distribution of variables on
different levels of measurement.
Mean => M =
∑ x or x (used with interval/ratio) = the average (of the sample)
n
Changing any score will change the mean
Adding/removing a score will change the mean (unless the score is already equal to
the mean)
Sum of differences from the mean is zero:
∑ (x−M )=0
Sum of squares (SS) => sum of squared differences from the mean is minimal. Lowest
possible. When using anything other than the mean to calculate SS, the outcome
would be higher.
∑ (x−M )2
∑ x => sum of all x ’s
Population mean => μ=
∑x
N
Median => (used with ordinal and interval/ratio) = 50th percentile = “middle case” when
written down in order.
Median in SPSS frequency table => first category that exceeds 50% in the ‘cumulative
percent’ column.
Outliers => value that sticks out from the rest (way lower/higher).
CM1005 Introduction to Statistical Analysis
,Mode => (used with nominal, ordinal and interval/ratio) the category with the largest
amount of cases.
Mode in SPSS frequency table => category with the highest percentage.
Nominal distributions => symmetric. Mean, median and mode are equal.
Week 2
Dispersion/variability => (spread) mean could be the same. E.g., first group (10 ×20+10 × 60)
has the same mean (40) as the second group (10 ×39+10 × 41).
Range => (ordinal, interval/ratio) distance between the highest and lowest score. Always
report with the maximum and minimum scores. Sensitive to outliers.
Interquartile range (IQR) => (ordinal, interval ratio) based on “quartiles” that split our data
into four equal groups of cases. Q1 (lower quartile), Q2 (median quartile) and Q3 (upper
quartile).
IQR=Q3 −Q1
Variance => (interval/ratio) based on Sum of Squares. Different for sample and population
data (sample is more common).
2 ∑ (x−M )2 2 ∑ (x−μ)
2
s= (sample) σ = (population)
n−1 N
Higher variance => more data difference.
n−1 => unbiased estimator
2 SS
Definitional variance => s = , where SS=∑ ( x−M )2
n−1
CM1005 Introduction to Statistical Analysis
, 2
SS ( x)
, where SS=∑ x2 − ∑
2
Computational variance => s = . No need to calculate
n−1 n
individual distances from the mean.
Standard deviation (SD) => (interval/ratio) approximate measure of the average distance to
the mean. It is the square root of the variance.
∑ ( x−M )2 (sample) ∑ ( x−μ)2 (population)
s=
√ n−1
σ=
√ N
Independent variable ( x ) => variable with values that are taken as simply given.
Dependent variable ( y ) => variable assumed to depend on, or be caused by, another (the
independent) variable.
Normally distributed variables. E.g. mean = 12 and SD = 4.
3 preconditions for making causal claims
1. Empirical evidence → for a relationship between the variables.
2. Temporal sequence → x occurs before the change or effect of y occurs.
3. Causality claim should be supported by reason and theory.
Confound variable => An unanticipated variable not accounted for in a research that could
be causing or associated with observed changes in one or more measured variables. E.g., in
a relation between feet size and reading skills, the confound variable is age.
Reverse causality => a problem that arises when the direction of causality between two
factors can be either direction.
Scatterplot => allows for graphical representation of the relationship between two
(interval/ratio) variables. Scatterplot’s x -axis is mostly for the independent variable.
CM1005 Introduction to Statistical Analysis
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller meggiew. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.77. You're not tied to anything after your purchase.