Simple linear regression = there is only ONE independent variable in the model.
The strength of the relationship is shown in the correlation coefficient. this is scored
between -1 and 1 whereas 1 = positive and -1 = negative.
0 states there is no relationship between the variables. – it can be a non-linear
relationship.
It is a standardized measure and that helps to compare multiple strengths with each other.
A correlation can never say something about causal effects.; it says nothing about the
fact that A causes B, they are just correlated. therefor you need an experiment.
To check the linear relationship the variable should be measured on interval/ratio level.
You use linear regression analysis to predict about linear relations NOT about non-linear
relations.
To make an equation you first must calculate the slope (A): if you increase X by 1, how
much does the slope change? After that you need to check the intercept (B in the equation)
ax + b
- The intercept can tell you some information, but not always.
The least squares method: a method which helps you decide where to draw a line.
Residual / error = the distance between the true value Y and the predicted value Y^. Y-Y^.
Based on the residuals you can look for the line with the smallest possible sum of squared
errors. = the least squares method.
- The formula = r (correlation coefficient) * standard deviation y / standard deviation x.
How well does the model fit? how well is the fit of the prediction = goodness of fit.
- One example is the R2. (R-squared). = the proportion of the variance of the
response explained by the model / predictor variables.
o A very large R-square doesn’t mean the model is a good predictor.
o A very small R-square doesn’t mean that there is a meaningless relationship
between the variables.
Multiple Linear Regression (Week 1)
Assumptions (Initial):
1. The dependent variable is a continuous measure (interval/ratio). – for example,
satisfaction is a continuous measure because they make it a composite scale and
they can be used as if they were continuous.
a. The independent variables must be continuous or dichotomous (nominal
with two categories).
2. There must be a linear relationship between the dependent variable and the
independent variables.
3. There must be an absence of outliers.
,To check these assumptions, it is important to visualize your data in SPSS and see if the
relationship is linear or whether there are outliers. learned from the Anscombe Quartet
you can see that violations of these assumptions can have influence on the statistical results.
To check the correlations in SPSS go to analyse – correlate – bivariate and choose the
variables you want to analyse.
Assumptions (statistical):
Absence of outliers you can determine this
through a scatter plot or box plot. BUT you can
also look at the residual statistics minimum
and maximum values of the standardised
residuals (Y-space), the mahalanobis distance (X-
space) and Cook’s distance (XY-space – extreme
combination of X and Y score).
Rule of thumb:
For the standardized residuals: between -
3.3 and +3.3
For the Mahalanobis distance: must be
lower than 10 + 2 * number independent
variables. when the number is higher,
there is an outlier.
For the Cook’s distance: must be lower than
1 higher indicates an influential
respondent.
Important when to decide if to remove an outlier:
- Does the participant belong to the group about which you want to make inferences
about? if not, don’t include them.
- Is the extreme value of the participant theoretically possible? if not, don’t include. If
so, run the analysis twice, with and without.
Absence of Multicollinearity you can find this information in the coefficients
table in the last columns. above the .8 means a too strong relationship. What
happens if you include?
o The regression coefficient (B) is unreliable.
o It limits the magnitude of R (the correlation between Y and Y^)
o The importance of the individual independent variables can hardly be
determined.
Rule of thumb:
o Values for the tolerance smaller than .2 indicate a potential problem.
o Values for the tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) bigger than 10 is a problem.
Homoscedasticity the spread of the residuals for an X value must be
approximately the same across all points. Plotting the standardised residuals
against the standardised predicted
values.
, Normally distributed residuals make a histogram.
Performing and interpreting Multiple Linear Regression:
R2 = the proportion of explained variance in the sample / model. How much of the variance
is explained by the model?
Adjusted R2 = an estimate of the proportion of explained variance in the population.
- It adjusts the R2 on basis of the sample size (N) and the number of predictors (K).
F-test = tells something about whether the model as a whole is significant. can the
independent variables together explain a significant part of the variance?
If the P is below 0.05 then it is significant.
To see how whether each independent variable is a significant predictor you can look at the
coefficient table. you can a at the P-value whether it is significant.
The Beta value tells you which predictor is the most significant.
You can perform a hierarchical multiple
regression analysis to check whether this
addition (using more independent variables)
provides a significantly better prediction
compared to less independent variables.
Based on the change statistics in the model
you can interpret your results.
- The R2 change from model 2 tells you
how much more is explained based on
the model compared to model 1.
In the ANOVA table only look at whether the models on themselves are significant.
Multiple regression with dummy variables:
How to include a categorical variable with more than two categories in a regression analysis.
you can do this by converting the variable into multiple dummy variables.
That takes 7 steps:
1. Count the number of groups that your variable has – 1.
2. Create as many new variables as you calculated in step 1. – these are the dummy
variables.
3. Choose which group will become your reference group. – to this group you compare
all the other groups.
4. Give your reference group the value 0 for all dummy variables.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller vdb99. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.04. You're not tied to anything after your purchase.