100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Grasple lessen €6,99   In winkelwagen

Overig

Samenvatting Grasple lessen

 18 keer bekeken  1 keer verkocht

In dit document zijn alle vijf de lessen van Grasple opgenomen met belangrijke informatie over hoe je analyses uitvoert, maar ook informatie over de analyses zelf.

Voorbeeld 4 van de 33  pagina's

  • 16 maart 2021
  • 33
  • 2020/2021
  • Overig
  • Onbekend
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (19)
avatar-seller
hannahvanrhoon
Grasple lessons ARMS 2020-2021
Grasple lesson 1 – Introduction
Week 1

 Simple linear regression: there is only one independent (predictor) variable
in the model
 Correlation coefficient: standardized number that assesses the strength of
a linear relationship
o An absolute value of 1 indicates maximum strength of a relation
between two variables
o A value of 0 indicates no linear relationship between the two
variables
o It is a standardized measure
o The correlation does not mean that the movement in one variable
causes the other variable to move as well
o A high positive correlation means that when one variable increases,
the other one also increases
o A high negative correlation means that when one variable increases,
the other one decreases
 Pearson’s r: allows you to compare correlations, because it is always
between -1 and 1
 In a non-linear relation you cannot calculate Pearson’s r
 A variable has to be measured at interval/ratio level to calculate
correlations
 First you draw a scatterplot, which provides valuable information about the
strength and the direction of the relationship
 An experiment is required to establish that there is a cause-effect
relationship, so that other explanations can be ruled out
 In essence, linear regression boils down to summarizing a bunch of data by
drawing a straight line through them
o We use linear regression to make predictions about linear relations
o The straight line is used to predict the value of one variable based
on the value of the other variable
 Slope: if X increases by one unit, how much does Y increase?
 Intercept: the point where the regression line crosses the y-axis
 Y value: intercept + slope x X-value
o The hat on the y is used to denote that this is not the observed y-


score but the predicted y-score
o B0 = intercept
o B1 = slope
 In a lot of occasions, the intercept by itself can be fairly meaningless and
only serves (mathematically) to support a correct prediction
 Linear regression is an analysis in which you attempt to summarize a
bunch of data points by drawing a straight line through them
 The distance between the true value y and the predicted value y is called
the error/residual

, Positive and negative errors cancel each other. The sum of all errors is
always zero
o When we square the errors, they will always be positive, and they do
not cancel each other —> this way we can look for a line that will
result in the smallest possible sum of squared errors —> the least
squares method v
 With the least squares method we can find a linear regression
model which fits the data best




 The following formulate determines the slope of the line with the smallest
sum of squared errors:


 The slope equals the correlation coefficient (pearson’s r) times the
standard deviation of y divided by the standard deviation of x —> you do
not need to be able to compute the best fitting linear regression model (B0
& B1), SPSS does this for you
o In the output, the slope is the regression coefficient of the variable
o The intercept is what SPSS calls the constant
 Goodness of fit: assesses how well the fit of the prediction is —> an
example is the R-squared number
 R-squared: determines the proportion of the variance of the response
variable that is explained by the predictor variable(s)
o It is a proportion between 0 and 1
o If the R-squared is very small, this does NOT mean that there is NO
meaningful relationship between the two variables —> it could still
be practically relevant even though it does not explain a large
amount of variance

,Grasple lesson 2 - Multiple linear regression
Week 1

Assumptions (Initial)

 Assumption 1: variables have to be continuous or dichotomous
 Assumption 2: relations have to be linear
 Assumption 3: there has to be an absence of outliers
 The influence of a violated model assumption on the results can be severe,
therefore it is important to visualize your data

Assumptions (statistical)

 Absence of outliers: click on Save in SPSS and check; standardized
residuals, Mahalanobis distance and cook’s distance
 Absence of multicollinearity: click on statistics and check: collinearity
diagnostics
 Homoscedasticity: click on plots, place the variable *ZPRED (The
standardized predicted valueS) on the X-axis and the variable *ZRESID (the
standardized residuals) on the Y-axis
 Normally distributed residuals: click on plots and check histogram
 Absence of outliers; Look at the residual statistics table and view the
minimum & maximum values of the standardized residuals/mahalanobis
distance/cook’s distance
o Standardized residuals: Checks for outliers in the Y-space: values
must be between -3.3 and +3.3, otherwise they indicate outliers
o Mahalanobis distance: checks whether there are outliers in the X-
space —> extreme score on a predictor or combination of
predictors. Must be lower tan 10+2x(number of independent
variables)
o Cook’s distance: checks whether there are outliers in the XY-space:
extreme combination of X and Y scores —> indicates the overall
influence of a respondent on a model. Must be lower than 1
 Higher cases: indicate influential respondents (influential
cases)
 When you have to make a choice about whether or not to remove an
outlier, a number of things are important:
o Does this participant belong to the group about what you want to
make inferences about? If not, do not include the participant in the
analyses
o Is the extreme value of the participant theoretically possible? If not,
do not include the participant in the analysis. If so, run the analysis
with and without the participant, report the results of both analyses
and discuss any differences.
 The coefficients table contains information on multicollinearity in the last
columns: this indicates whether the relation between two or more
independent variables is too strong (r >0.8).
o These two variables are most likely interrelated
 If you include overly related variables in your model, this has 3
consequences:
o The regression coefficients (B) are unreliable
o It limits the magnitude of R (correlation between Y and Y-hat)
o The importance of individual independent variables can hardly be
determined, if at all

,  SO: you DON’T want multicollinearity: perfect multicollinearity means that
your independent variables are perfectly correlated
 Rule of thumb: values for the Tolerance smaller than 0.2 indicate a
potential problem, smaller than 0.1 indicate a problem
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the
VIF; values greater than 10 indicate a problem.
 You can find VIF and Tolerance in the last two columns in the coefficients
table
 Homoscedasticity: means that the spread of residuals for an X value must
be approximately the same across all points. We assess this by plotting the
standardized residuals against the standardized predicted values
o If for every predicted value (X-axis) there is approximately the same
amount of spread across the Y-axis, then the condition is met




 Normally distributed residuals: although a histogram does not exactly
follow the line of normal distribution, the deviations are not great enough
that we would conclude that the condition for normally distributed
residuals has been violated

Performing and interpreting MLR

 If all the assumptions are met, the regression model can be interpreted
 Multiple correlation coefficient R: this value indicates the correlation
between the observed satisfaction scores (Y) and the predicted satisfaction
scores (Y-hat)
o It is used to say something about how good the model is at
predicting satisfaction (in this case!!)
 R-squared: normally assesses how much variance of the dependent
variable is explained by the model
o Refers to the proportion of explained variance in the sample
 Adjusted R-squared: is an estimate of the proportion of explained variance
in the population. It adjusts the value of -R-squared on the basis of the
sample size n and the number of predictors in the model k
o The estimated proportion of explained variance in the population is
always somewhat lower in the proportion of explained variance in
the sample
 F-test: considers whether the model as a whole is significant
o Here we look at whether the three independent variables together
can explain a significant part of the variance
 In the ANOVA-table, we only look at whether the models on themselves are
significant.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper hannahvanrhoon. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 66579 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,99  1x  verkocht
  • (0)
  Kopen