Recap RBMS
We only retain the H0, not accept that it is true. Only speaking of a non-signifficant result.
Research question: a good reseach question has PICOS (population, intervention, comparison,
outcome variables, study design).
Hypothesis: H0 = no effect, H1 = an effect. This is two sided hypothesis. When there is only one side
possible, one sided hypotheses: H0 = smaller then, larger than. H1 = smaller or equal to…
Research design: dependent & independent variables. By the independent is the design between
subjects or within subjects. Designs:
Observational: cross-sectional, case control, prospective.
Experimental: randomized control design, cross over design. > allow conclusions
Descriptive statistics: to summerize the observed data in sample. Can be measures of tendency (mean,
median, mode), variability. All with graphs and figures.
Inferential statitics: to draw conclusions about a population based on the data observed in a sample.
Resulting in a P value < 0.05 = unlikely enough to reject H0.
Test statitic = point estimate – expected /SE
One sample t test t = x – u0 / se
Null hypotheses is u0 = value. SE = sd/Ön
Two sample t test independent t = (x1-x2)-(u1-u2)/se
Null hypotheses is means of two groups are equal.
SE =
Two sample t test dependent t = x – ud/se
Null hypotheses is means from two conditions in one SE = sd /Ön
sample are equal
If the sample distribution resebles a population distribution can be normal distributed. Then do
parametric statistics, is not nonparameteric statistic.
In a normal distribution, 95% falls in -1.96 SD and +1.96 SD.
Comparing 2 different normal distributions by, calculate Z score to standardize the distributions.
Z = x – mean / sd. The z score says how many SD it deviates from the other mean.
Calculate the probability all left of the of the value in z score table. So if you want to know right part,
100 – probability.
A t test has a t-distribtion. Calculate critical t value, marking the 95%. If the observed t lies within the
interval > retain H0. If the observed t is more extreme > reject h0.
A p value does not indicate the size of the effect. Only a strong signal or little noise.
Errors:
Type 1 error = reject H0, when you should have retained the H0
Type 2 error = retain H0, when you should have rejected H0.
Confidence intervals:
95% confident that the population parameter lies within the CI (95%) (u) = x +- t95% * se
data. If the mean under H0 does noet lie within the CI around SE = sd / Ön
the sample mean, reject H0.
Correlation: r gives the assocation between two variables. The larger r, the stronger the relationship.
Parametric test: pearson test. Non parametric test: spearman rho. Important: Covariation is not
causalitiy. Correlation must be linear. Correlation is sensitive for outliers. No prediction of the effect.
, Regression I
Regression predictors to fit a linear association. The type of study depends on the type of variables. If
the independent and dependent variable are both continuous, linear regression can be done.
Standard Error = the mean of each sample deviates from the population mean. If sample is large
enough the sampling can be seen as normal.
Basics of linear regression: define the line of best fit, through the middle of your data with minimizing
the total squared error.
y=a+b*x+e a = intercept, x=0
b = slope (effect of x on y)
e = residual, difference between true and predicted values of DV.
Can insert x, predict the DV if IV is known. And the slope describes the relationship between two
variables. Transforming the variables does not matter, from gr to kg for example. The result of the
statistical test will not change. This is only true for linear observations.
If non-linear, like log, can have an effect on the statistical test result.
If the regression is standardized, x can have an effect on y, and the other way around. This is most
times not the case.
Correlation vs regression: unstandardized effect size can give association and direction.
Regression interpretations:
Variance (R2):
Strength and direction -1 to 1. Shows the improvement of the regression instead of a straight line.
R2 quantifies the proportion of the y value. R2 = SSm / SSt, with SSm = SSt – SSr
This lies within 0 and 1. (SSt = total sum of squares), (SSr is residual sum of
squares), (SSm = model of sum of squares).
The adjusted R2 corrects also the inflation, so more generalized for the population instead of sample.
Hypothesis testing:
- Regression coefficient (t test), is the IV significant predictor for DV?
Null hypotheses, b (slope) = 0. H1, b ¹ 0.
Gives measure of effect size, not measure of test significance. t test = (b observed – bH0) / Se(b
observed). Coefficient table in SPSS gives t value & p value. Df = N – p – 1. In simple regression
= n – 1 -1.
Can also calculate CI95, estimate -/+ 1.96 * se
- Full model (F test), is this model a good fit for my data?
Evaluate the model as a whole. The simplest model is preferred. The F test provides the ratio over
how good the explained model is, to the unexplained residual variance. The larger F, the greater the
improvement over the means model. Looks like R2, but F provide ratio.
Regression assumption:
What can we use to draw conclusions about the population, and not about the data? Meet the
assumptions:
1. Linearity, check scatter plot of IV/DV is linear. If not met > use non-linear regression.
2. Normal residuals, check histogram of residuals. The mean » 0. If not met > non-parametric
3. Independent observations, no repeated data, unrelated. Seen in the scatter plot when data has
weird outcome. If not met > use multilevel modeling.
4. Homoscedasticity, variation in DV should be constant across range. Plotting residuals, look if
nicely spread. If not met > heteroscedasticity. Can be corrected with WLS regression or adjust
5. Outliers, one outlier can bias the estimated slope and SE. check and potential remove the
outlier. So always make scatter plot to check the data.
Regression vs t-test:
In means model, b1 = 0. In regression model b1 > 0. Slope in regression model is the same as the
difference in the mean between groups.