100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Grasple lessons $7.60   Add to cart

Other

Summary Grasple lessons

 18 views  1 purchase
  • Course
  • Institution
  • Book

This document includes all five Grasple lessons with important information on how to perform analyses, as well as information about the analyses themselves.

Preview 4 out of 33  pages

  • March 16, 2021
  • 33
  • 2020/2021
  • Other
  • Unknown
avatar-seller
Grasple lessons ARMS 2020-2021
Grasple lesson 1 – Introduction
Week 1

 Simple linear regression: there is only one independent (predictor) variable
in the model
 Correlation coefficient: standardized number that assesses the strength of
a linear relationship
o An absolute value of 1 indicates maximum strength of a relation
between two variables
o A value of 0 indicates no linear relationship between the two
variables
o It is a standardized measure
o The correlation does not mean that the movement in one variable
causes the other variable to move as well
o A high positive correlation means that when one variable increases,
the other one also increases
o A high negative correlation means that when one variable increases,
the other one decreases
 Pearson’s r: allows you to compare correlations, because it is always
between -1 and 1
 In a non-linear relation you cannot calculate Pearson’s r
 A variable has to be measured at interval/ratio level to calculate
correlations
 First you draw a scatterplot, which provides valuable information about the
strength and the direction of the relationship
 An experiment is required to establish that there is a cause-effect
relationship, so that other explanations can be ruled out
 In essence, linear regression boils down to summarizing a bunch of data by
drawing a straight line through them
o We use linear regression to make predictions about linear relations
o The straight line is used to predict the value of one variable based
on the value of the other variable
 Slope: if X increases by one unit, how much does Y increase?
 Intercept: the point where the regression line crosses the y-axis
 Y value: intercept + slope x X-value
o The hat on the y is used to denote that this is not the observed y-


score but the predicted y-score
o B0 = intercept
o B1 = slope
 In a lot of occasions, the intercept by itself can be fairly meaningless and
only serves (mathematically) to support a correct prediction
 Linear regression is an analysis in which you attempt to summarize a
bunch of data points by drawing a straight line through them
 The distance between the true value y and the predicted value y is called
the error/residual

, Positive and negative errors cancel each other. The sum of all errors is
always zero
o When we square the errors, they will always be positive, and they do
not cancel each other —> this way we can look for a line that will
result in the smallest possible sum of squared errors —> the least
squares method v
 With the least squares method we can find a linear regression
model which fits the data best




 The following formulate determines the slope of the line with the smallest
sum of squared errors:


 The slope equals the correlation coefficient (pearson’s r) times the
standard deviation of y divided by the standard deviation of x —> you do
not need to be able to compute the best fitting linear regression model (B0
& B1), SPSS does this for you
o In the output, the slope is the regression coefficient of the variable
o The intercept is what SPSS calls the constant
 Goodness of fit: assesses how well the fit of the prediction is —> an
example is the R-squared number
 R-squared: determines the proportion of the variance of the response
variable that is explained by the predictor variable(s)
o It is a proportion between 0 and 1
o If the R-squared is very small, this does NOT mean that there is NO
meaningful relationship between the two variables —> it could still
be practically relevant even though it does not explain a large
amount of variance

,Grasple lesson 2 - Multiple linear regression
Week 1

Assumptions (Initial)

 Assumption 1: variables have to be continuous or dichotomous
 Assumption 2: relations have to be linear
 Assumption 3: there has to be an absence of outliers
 The influence of a violated model assumption on the results can be severe,
therefore it is important to visualize your data

Assumptions (statistical)

 Absence of outliers: click on Save in SPSS and check; standardized
residuals, Mahalanobis distance and cook’s distance
 Absence of multicollinearity: click on statistics and check: collinearity
diagnostics
 Homoscedasticity: click on plots, place the variable *ZPRED (The
standardized predicted valueS) on the X-axis and the variable *ZRESID (the
standardized residuals) on the Y-axis
 Normally distributed residuals: click on plots and check histogram
 Absence of outliers; Look at the residual statistics table and view the
minimum & maximum values of the standardized residuals/mahalanobis
distance/cook’s distance
o Standardized residuals: Checks for outliers in the Y-space: values
must be between -3.3 and +3.3, otherwise they indicate outliers
o Mahalanobis distance: checks whether there are outliers in the X-
space —> extreme score on a predictor or combination of
predictors. Must be lower tan 10+2x(number of independent
variables)
o Cook’s distance: checks whether there are outliers in the XY-space:
extreme combination of X and Y scores —> indicates the overall
influence of a respondent on a model. Must be lower than 1
 Higher cases: indicate influential respondents (influential
cases)
 When you have to make a choice about whether or not to remove an
outlier, a number of things are important:
o Does this participant belong to the group about what you want to
make inferences about? If not, do not include the participant in the
analyses
o Is the extreme value of the participant theoretically possible? If not,
do not include the participant in the analysis. If so, run the analysis
with and without the participant, report the results of both analyses
and discuss any differences.
 The coefficients table contains information on multicollinearity in the last
columns: this indicates whether the relation between two or more
independent variables is too strong (r >0.8).
o These two variables are most likely interrelated
 If you include overly related variables in your model, this has 3
consequences:
o The regression coefficients (B) are unreliable
o It limits the magnitude of R (correlation between Y and Y-hat)
o The importance of individual independent variables can hardly be
determined, if at all

,  SO: you DON’T want multicollinearity: perfect multicollinearity means that
your independent variables are perfectly correlated
 Rule of thumb: values for the Tolerance smaller than 0.2 indicate a
potential problem, smaller than 0.1 indicate a problem
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the
VIF; values greater than 10 indicate a problem.
 You can find VIF and Tolerance in the last two columns in the coefficients
table
 Homoscedasticity: means that the spread of residuals for an X value must
be approximately the same across all points. We assess this by plotting the
standardized residuals against the standardized predicted values
o If for every predicted value (X-axis) there is approximately the same
amount of spread across the Y-axis, then the condition is met




 Normally distributed residuals: although a histogram does not exactly
follow the line of normal distribution, the deviations are not great enough
that we would conclude that the condition for normally distributed
residuals has been violated

Performing and interpreting MLR

 If all the assumptions are met, the regression model can be interpreted
 Multiple correlation coefficient R: this value indicates the correlation
between the observed satisfaction scores (Y) and the predicted satisfaction
scores (Y-hat)
o It is used to say something about how good the model is at
predicting satisfaction (in this case!!)
 R-squared: normally assesses how much variance of the dependent
variable is explained by the model
o Refers to the proportion of explained variance in the sample
 Adjusted R-squared: is an estimate of the proportion of explained variance
in the population. It adjusts the value of -R-squared on the basis of the
sample size n and the number of predictors in the model k
o The estimated proportion of explained variance in the population is
always somewhat lower in the proportion of explained variance in
the sample
 F-test: considers whether the model as a whole is significant
o Here we look at whether the three independent variables together
can explain a significant part of the variance
 In the ANOVA-table, we only look at whether the models on themselves are
significant.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller hannahvanrhoon. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.60. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

64438 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.60  1x  sold
  • (0)
  Add to cart