An overview of all exam materials you have to know before subtest 2. Made for several premaster programs at the UTwente. Based on all assignments and lectures by Henk van der Kolk. An addition to the R-Guide that is provided in the course. (Overview of key points and important information, made as ...
Inferential Statistics test 2 - Unit 550, 553, 554, 560, 561, 563,
510, 545, 548 & 590
Cicely Bullee
Libraries used in R: tidyverse, janitor, ggplot, ggplotExtra, broom, modelr, Rbase,
car, lmtest.
, 1
Unit 550
Key terms:
multiple regression
addition
(analysis of) residuals
Build up and assess a multiple regression model with the additive effect of
variables using R.
Micro lectures
Addition: both independent variables (x1, x2), independently affect your
dependent variable.
- Linear equation: Y = b0 + b2*x2 + b1*x1 (+ε)
- In the example below, x2 is type (dummy), x1 is education (ratio).
- b2 is second because we want to know the intercept, the slope is the same
in addition.
- b2 is the difference between the intercepts of both lines.
- Residuals should be ‘normal’ and ‘equal’, ε is normally distributed.
In multiple regression there are two types of expectations:
1. General expectation (R2 and F-test)
2. Specific expectation (b-coefficients and t-test)
A relationship with a dependent scale variable and
two independent ratio variables, looks
like this:
The blue line is the reference category.
In the linear equation: x2 is in this case
education, x1 is age
, 2
Residuals (ei): in the sample.
Errors (εi): in the population.
We are interested in the residuals in the population, if there are small deviations
in the residuals of a sample this is not necessarily problematic.
Residuals give an indication about how good the estimates (b-coefficients) are.
What should residuals look like:
1. The distribution of the residuals should be normal. (All other factors
combined create mere noise).
(Histogram with residuals)
2. The residuals should have the same variance everywhere in the model.
(otherwise, we probably mis specified the model). Residuals should be in a
‘box’.
((scatterplot with residuals (y) +
predictions(x))
Problematic residuals cause (strong effect on the estimates):
1. Non-linearity
2. Other factors play a role too.
Solutions problematic residuals:
1. Change/ ‘reconceptualize’ your variables
2. Change the model/ include extra variables (parabolic or logarithmic)
R Studio
When adding a dummy variable in an lm(), put as.factor in front of it. (So R knows
it’s a dummy and not simply the numbers 1 or 0).
In summary model: (Ho: likely these data are from a population in which there is
NO linear association between Y and the associated variables).
Check your data!! Filter out any n/a or 999 numbers.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller cbullee. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.43. You're not tied to anything after your purchase.