100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Quantitative Data Analysis 2 Final Exam Summary €8,49   In winkelwagen

College aantekeningen

Quantitative Data Analysis 2 Final Exam Summary

 33 keer bekeken  1 keer verkocht

Complete summary for the final exam of QDA 2 of all the lectures in the course.

Voorbeeld 4 van de 42  pagina's

  • 7 november 2021
  • 42
  • 2020/2021
  • College aantekeningen
  • Roger pruppers
  • Alle colleges
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (8)
avatar-seller
bastudent
WEEK 5
- Multicollinearity
- Categorical PVs in regression (dummy variables)
- Moderation in regression
- Mediation in regression

Assumptions of regression
- Not meeting assumptions will still produce statistics and numbers but they will not necessarily be valid and therefore we
should consider whether managers should base decisions on the outcomes.
o Normality: residuals (difference between the predicted and observed values) are normally (bell-shaped) distributed.
Possible remedy if it is not normal: variable transformation
(e.g. log, square root).
o Independence of observations: independence of residuals for
any two observations (uncorrelated residuals). Solve by running
a Durbin-Watson test in SPSS.
o Linear relationship between the PV and OV: check with a
scatterplot in SPSS and visually inspect.
o No influential outliers: plot standardized residuals (converted to
z-scores). Check for outliers, e.g. with z-scores > 1,96. Cook’s
distance: check the effect of a single data point.
o Homoscedasticity: error variances same for all combinations of
PVs and moderators (they are uncorrelated). Plot SPSS Zpred vs
Zresid. Opposite: heteroscedasticity.
o No substantial multicollinearity: situation when two or more PVs
highly correlate with each other.

Multicollinearity
- (Strong) presence of correlation among the PVs
o PVs not totally independent
o E.g. if one PV goes up, not only the OV goes up (or down), but also another PV goes up (or down).
- It does not create an issue for model testing not a problem. Not an issue for predictive power or reliability of whole model.
- Major problem for individual coefficient testing.
- Affects calculations regarding individual PVs:
o Coefficient testing problematic
o Contribution of individual PVs in explaining the OV is difficult to disentangle.
• Which PVs have an effect and which are redundant?
- Remedies:
o Factor analysis: perhaps several correlated PVs reflect one underlying “latent” variable.
o Estimate model several times: each time excluding one or more of the problematic variables.
o For regression models that include interaction: mean-centering data to lower your measured/structural collinearity.
- Need to check collinearity diagnostics before proceeding interpretation.

Detecting multicollinearity
- Per PV, determine the tolerance statistics and variance inflation factor (VIF).
o Indicators of multicollinearity.
o They will result in the same conclusion.
- How is it calculated? For each PV we treat it like an OV and run a regression. So, when one PV acts as an OV, the other PVs
are still PVs.
- The R2 from this regression is called: Rj2: the “j” stands for the respective PVj, that acts as the OV in the analysis.
- Here we want the Rj2 to be small because we do not want relationships between the PVs.
- We will thus get a Rj2 for every PVj, and thus, a tolerance and VIF for every PV.
- Need to check collinearity for all the PVs.

!"#$%&'($ = 1 − ,!"

1
-./ =
!"#$%&'($

,Multicollinearity test SPSS example
- Analyze > regression > linear > statistics > select collinearity diagnostics
- Adds a set of columns to the coefficients table.




- No collinearity problems:
o All PVs have a tolerance > 0.2 and all PVs also automatically have a VIF < 5
- Implications:
o The regression model is okay an no adaptations are necessary.

Categorical PVs in regression
- What if there are categorical PVs in the data set or a combination of categorical and quantitative predictor variables e.g.
gender, nationality, education etc.
- If included in the regression model without modification, SPSS treats the group codes (e.g. nationality: 0=NL, 1=GER, 2=EN),
as if it were a quantitative scale.
o The regression line will shift depending on the way it is coded.
o But these values only distinguish subgroups, they have no further numerical meaning, especially for nominal data.
o If we want to include categorical PVs in our regression model, we need to modify them.
- We create dummy variables as separate PVs. They represent the subgroups of a categorical PV.
- Recode the original measure for the categorical variable into new sets of variables called dummies.
- They are called “dummies”, because they artificially “stand in” for subgroups
- Dummy variables are categorical variables with two categories, coded by [0] and [1].
- They act like switches that turn various values off (value =0) and on (value=1).
- Number of dummies we create = [# levels categorical PV] – 1
- The group for which no dummy is created is called the reference group.

Creating dummy variables
- Let’s create a dummy for categorical variable “nationality,” which can be used as a PV in subsequent analyses.
- 3 categories: Dutch (originally coded as “0”), German (coded as “1”), English (coded as “2”)
- There are three categories, so we need two dummies (3 categories – 1 = 2 dummies)
- First determine the reference group, usually depends on the context/hypotheses.
- We assign each group with either [0] or [1].
o [1] means we include values from the group represented by the dummy in our calculations. The group that gets the
dummy.
o [0] means we do not. The reference group.

Regression equation with dummy variables example
Research question: Predict Income via Age & Gender, and whether someone is German or Dutch
- Choose Dutch and female as the reference group.
- Variables:
o OV = Annual income (€)
o PV1 = Age (quantitative)
o PV2 = Gender (0 = female, 1 = male).
o PV3 = nat_GER (0 = Dutch or English, 1 = German)
o PV4 = nat_ENG (0 = Dutch or German, 1 = English)

Consider gender: there are two groups and therefore we need one dummy and one reference group.
Gender is the reference group.
- For female participants:

- For male participants:

,Interpretation β coefficient dummies for gender
- Using dummies, the only difference between the two variables is the β
coefficient. This is indicated by the intercept.
- For the reference group, the intercept is 1.
- For the dummy, the intercept is 1 + 2" .
- Interpreting β coefficients:
o For quantitative variables: the change in the OV when the PV increases by 1
unit on the scale.
o For dummy variables: the change in OV when the dummy goes from 0 to 1. This is the same as the difference between
the mean OV for the dummy group and the mean OV for the reference group. *When all other PVs are held constant.
• E.g. difference in income between average scores of males and females.

Consider nationality: there are three groups and therefore we need two dummy variables and one reference group.
Dutch is the reference group.
- For Dutch participants:

- For German participants:

- For English participants:


Interpretation β coefficient dummies for nationality
- All the coefficients are positive and thus the lines are upward sloping.
- No interaction yet: the slopes are the same for all nationalities.
- The difference between the nationalities is the coefficients – determined on the y-
intercept.
- If the coefficients were negative, the lines would be downward sloping and the
English and the German would be below the Dutch.

Creating dummy variables in SPSS
- Transform > Recode into different variables *always keep
original data
- OR File > New > Syntax:
o RECODE nationality (0=0) (1=1) (2=0) INTO nat_GER.
o RECODE nationality (0=0) (1=0) (2=1) INTO nat_ENG.
o EXECUTE.
- Creating dummies does not create changes in the output but rather in the data set.
*If Dutch was a dummy = 1,0,0

Testing β coefficient dummies
- Using regression with income as the outcome variable and age, gender, and nationality as the predictor vairables.
- Consider the coefficients table:




- P-values:
o Age: Age has a significant impact on income. For every year one gets older, on average, income increases.
o Gender: No significant difference between men and women (marginally significant – so would depend on hypothesis)
o Nat_Eng: English participants have significantly higher average incomes than Dutch participants.
o Nat_Ger: The difference in income between Dutch and German participants is not significant.
- Using this dummy model, no conclusions can be drawn about the difference between Germans and others, because it
cannot be tested statistically.

, Moderation in regression
- Conceptually moderation is the same as N-way ANOVA: the effect of one PV on OV
depends on level of another PV, but in this case, one or both of the PVs are
quantitative.




What is some of the beta coefficients were negative?
- β1 is positive: the lines have a positive slope.
- Β2 is negative: Germans below the Dutch
Β3 is positive: the slope for the Germans is even more positive than the Dutch
- If β3 is significant then the β1+ Β3 slope is significantly different from the B1 slope and then there is interaction.




- B1 is negative: the lines have a negative slope.
- B2 is negative: Germans below the Dutch
- B3 is negative: the slope is even more negative for the Germans than for the Dutch.




Moderation in regression SPSS example
- Interaction in N-way ANOVA is automatically calculated (A*B) in SPSS.
- Interaction in regression:
o Manually:
• Create new interaction variables via “compute”
• Product of existing PVs: interaction term = PV1*PV2
• There is a risk of multicollinearity
o Via PROCESS command:
• Custom dialog box
• Menu developed by Hayes
• Also used for mediation

Moderation in regression conceptual example
- With a categorical moderator and a quantitative core PV:
- E.g. explain income via age, with nationality as the moderator.
- Effect of one PV depends on the value of the other PV e.g. effect size (size of β) of age on income depends on nationality.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper bastudent. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67474 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,49  1x  verkocht
  • (0)
  Kopen