Simple linear regression = there is only ONE independent variable in the model.
The strength of the relationship is shown in the correlation coefficient. this is scored
between -1 and 1 whereas 1 = positive and -1 = negative.
0 states there is no relationship between the variables. – it can be a non-linear
relationship.
It is a standardized measure and that helps to compare multiple strengths with each other.
A correlation can never say something about causal effects.; it says nothing about the
fact that A causes B, they are just correlated. therefor you need an experiment.
To check the linear relationship the variable should be measured on interval/ratio level.
You use linear regression analysis to predict about linear relations NOT about non-linear
relations.
To make an equation you first must calculate the slope (A): if you increase X by 1, how
much does the slope change? After that you need to check the intercept (B in the equation)
ax + b
- The intercept can tell you some information, but not always.
The least squares method: a method which helps you decide where to draw a line.
Residual / error = the distance between the true value Y and the predicted value Y^. Y-Y^.
Based on the residuals you can look for the line with the smallest possible sum of squared
errors. = the least squares method.
- The formula = r (correlation coefficient) * standard deviation y / standard deviation x.
How well does the model fit? how well is the fit of the prediction = goodness of fit.
- One example is the R2. (R-squared). = the proportion of the variance of the
response explained by the model / predictor variables.
o A very large R-square doesn’t mean the model is a good predictor.
o A very small R-square doesn’t mean that there is a meaningless relationship
between the variables.
Multiple Linear Regression (Week 1)
Assumptions (Initial):
1. The dependent variable is a continuous measure (interval/ratio). – for example,
satisfaction is a continuous measure because they make it a composite scale and
they can be used as if they were continuous.
a. The independent variables must be continuous or dichotomous (nominal
with two categories).
2. There must be a linear relationship between the dependent variable and the
independent variables.
3. There must be an absence of outliers.
,To check these assumptions, it is important to visualize your data in SPSS and see if the
relationship is linear or whether there are outliers. learned from the Anscombe Quartet
you can see that violations of these assumptions can have influence on the statistical results.
To check the correlations in SPSS go to analyse – correlate – bivariate and choose the
variables you want to analyse.
Assumptions (statistical):
Absence of outliers you can determine this
through a scatter plot or box plot. BUT you can
also look at the residual statistics minimum
and maximum values of the standardised
residuals (Y-space), the mahalanobis distance (X-
space) and Cook’s distance (XY-space – extreme
combination of X and Y score).
Rule of thumb:
For the standardized residuals: between -
3.3 and +3.3
For the Mahalanobis distance: must be
lower than 10 + 2 * number independent
variables. when the number is higher,
there is an outlier.
For the Cook’s distance: must be lower than
1 higher indicates an influential
respondent.
Important when to decide if to remove an outlier:
- Does the participant belong to the group about which you want to make inferences
about? if not, don’t include them.
- Is the extreme value of the participant theoretically possible? if not, don’t include. If
so, run the analysis twice, with and without.
Absence of Multicollinearity you can find this information in the coefficients
table in the last columns. above the .8 means a too strong relationship. What
happens if you include?
o The regression coefficient (B) is unreliable.
o It limits the magnitude of R (the correlation between Y and Y^)
o The importance of the individual independent variables can hardly be
determined.
Rule of thumb:
o Values for the tolerance smaller than .2 indicate a potential problem.
o Values for the tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) bigger than 10 is a problem.
Homoscedasticity the spread of the residuals for an X value must be
approximately the same across all points. Plotting the standardised residuals
against the standardised predicted
values.
, Normally distributed residuals make a histogram.
Performing and interpreting Multiple Linear Regression:
R2 = the proportion of explained variance in the sample / model. How much of the variance
is explained by the model?
Adjusted R2 = an estimate of the proportion of explained variance in the population.
- It adjusts the R2 on basis of the sample size (N) and the number of predictors (K).
F-test = tells something about whether the model as a whole is significant. can the
independent variables together explain a significant part of the variance?
If the P is below 0.05 then it is significant.
To see how whether each independent variable is a significant predictor you can look at the
coefficient table. you can a at the P-value whether it is significant.
The Beta value tells you which predictor is the most significant.
You can perform a hierarchical multiple
regression analysis to check whether this
addition (using more independent variables)
provides a significantly better prediction
compared to less independent variables.
Based on the change statistics in the model
you can interpret your results.
- The R2 change from model 2 tells you
how much more is explained based on
the model compared to model 1.
In the ANOVA table only look at whether the models on themselves are significant.
Multiple regression with dummy variables:
How to include a categorical variable with more than two categories in a regression analysis.
you can do this by converting the variable into multiple dummy variables.
That takes 7 steps:
1. Count the number of groups that your variable has – 1.
2. Create as many new variables as you calculated in step 1. – these are the dummy
variables.
3. Choose which group will become your reference group. – to this group you compare
all the other groups.
4. Give your reference group the value 0 for all dummy variables.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper vdb99. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.