Methods,With Questions & Correct Answers | solutions A+.
ANOVA - CORRECT ANSWERS -independent t-test when there are more than 2
groups - are the means different between the groups?; null hypothesis: The populations
have the same mean (and by extension variannce)
F statistic - CORRECT ANSWERS -between group variance divided by the within group
variance (used in ANOVA); when the null hypothesis is true F = 1
ANVOA assumptions - CORRECT ANSWERS -1. independence of observations
2. normality of each group (qqPlot & shapiro-willks)
3. homogeneity of variance (levene's test; non-equal variance = oneway test)
** perform posthoc testing as well (Tukey)
post hoc tests - CORRECT ANSWERS -- ANOVA = Tukeys - tells which group(s) are
different
- Kruskal Wallis = Dunn - adjusts p-values for multiple comparisons
dataframe - CORRECT ANSWERS -data structure that shapes data into a 2
dimensional table of rows and columns
variable - CORRECT ANSWERS -quality, quantity, or property that can be measured
(columns in dataframe)
observation - CORRECT ANSWERS -set of measurements made under similar
conditions (contains several values each associated with a variable)
value - CORRECT ANSWERS -the measurement that is recorded or a particular
observation of a variable
data wrangling - CORRECT ANSWERS -process of tidying data and pre-processing it
to make analysis easier
Numeric variables (2 types) - CORRECT ANSWERS -continuous: numerical variable
that can be an infinite number of real values (decimals)
discrete: numerical variable that can only take a finite number of real values within an
interval (intervals)
Categorical variables (2 types) - CORRECT ANSWERS -Nominal: describes a label or
category without a natural order
Ordinal: describes a label or category where there is a natural order between values
Atomic vector types - CORRECT ANSWERS -Logical: true, false
Character: most flexible type, alphanumeric
, Numeric vector types - CORRECT ANSWERS -integer: whole numbers
double: decimal numbers
summary statistics - CORRECT ANSWERS -produce measures of central tendency &
measures of spread/dispersion; mean & stdev OR median and IQR if not normally
distributed
mean vs median (measures of central tendency) - CORRECT ANSWERS -median
more robust to outliers and/or extreme values (particularly with small sample sizes)
measures of dispersion - CORRECT ANSWERS -range, variance, standard deviation,
interquartile range, coefficient of variation (measure the spread of the data)
data science pipeline - CORRECT ANSWERS -import - tidy - transform - visualize -
model - communicate
Why use a bar plot - CORRECT ANSWERS -for a single discrete variable (count x axis,
category y)
why use a histogram? - CORRECT ANSWERS -for a single continuous variable (count
x variable, y measurement variable)
why use scatterplots - CORRECT ANSWERS -two continuous variables
why box plots - CORRECT ANSWERS -one continuous (y axis) and one discrete
variable (x axis)
why violin plot? - CORRECT ANSWERS -box plots that show density distribution of the
data
why raincloud plot - CORRECT ANSWERS -violin plots and show the data points as
well
the normal curve - CORRECT ANSWERS -symmetrical distribution about the mean;
mean, median, and mode are the same; mean = 0. sd = 1; most scores near the middle
Normal curve (percentage distributions) - CORRECT ANSWERS -1 sd = 34% (68%
data); 2 sd = 13.5% (95% data); 3 sd = 2.35 % (99.7% data); 4 sd = 0.15% (100% data)
z-score - CORRECT ANSWERS -quantifies standardized deviation from the mean for a
single individual; z = (x-mu)/sd; can be used to calculate the percentage of distribution
below and above a z-score or between 2 z-scores
population - CORRECT ANSWERS -entire groups of people/subjects a researcher
intends their results to apply to