Exam Guide for Applied Multivariate Data Analysis – Get yourself a Wonderful Grade!
Summary of Statistics (IBC), Radboud University
Answers assignment 3 business research methods
All for this textbook (117)
Written for
Universiteit van Amsterdam (UvA)
Business Administration
Quantitative Data Analysis 2
All documents for this subject (8)
Seller
Follow
bastudent
Reviews received
Content preview
WEEK 5
- Multicollinearity
- Categorical PVs in regression (dummy variables)
- Moderation in regression
- Mediation in regression
Assumptions of regression
- Not meeting assumptions will still produce statistics and numbers but they will not necessarily be valid and therefore we
should consider whether managers should base decisions on the outcomes.
o Normality: residuals (difference between the predicted and observed values) are normally (bell-shaped) distributed.
Possible remedy if it is not normal: variable transformation
(e.g. log, square root).
o Independence of observations: independence of residuals for
any two observations (uncorrelated residuals). Solve by running
a Durbin-Watson test in SPSS.
o Linear relationship between the PV and OV: check with a
scatterplot in SPSS and visually inspect.
o No influential outliers: plot standardized residuals (converted to
z-scores). Check for outliers, e.g. with z-scores > 1,96. Cook’s
distance: check the effect of a single data point.
o Homoscedasticity: error variances same for all combinations of
PVs and moderators (they are uncorrelated). Plot SPSS Zpred vs
Zresid. Opposite: heteroscedasticity.
o No substantial multicollinearity: situation when two or more PVs
highly correlate with each other.
Multicollinearity
- (Strong) presence of correlation among the PVs
o PVs not totally independent
o E.g. if one PV goes up, not only the OV goes up (or down), but also another PV goes up (or down).
- It does not create an issue for model testing not a problem. Not an issue for predictive power or reliability of whole model.
- Major problem for individual coefficient testing.
- Affects calculations regarding individual PVs:
o Coefficient testing problematic
o Contribution of individual PVs in explaining the OV is difficult to disentangle.
• Which PVs have an effect and which are redundant?
- Remedies:
o Factor analysis: perhaps several correlated PVs reflect one underlying “latent” variable.
o Estimate model several times: each time excluding one or more of the problematic variables.
o For regression models that include interaction: mean-centering data to lower your measured/structural collinearity.
- Need to check collinearity diagnostics before proceeding interpretation.
Detecting multicollinearity
- Per PV, determine the tolerance statistics and variance inflation factor (VIF).
o Indicators of multicollinearity.
o They will result in the same conclusion.
- How is it calculated? For each PV we treat it like an OV and run a regression. So, when one PV acts as an OV, the other PVs
are still PVs.
- The R2 from this regression is called: Rj2: the “j” stands for the respective PVj, that acts as the OV in the analysis.
- Here we want the Rj2 to be small because we do not want relationships between the PVs.
- We will thus get a Rj2 for every PVj, and thus, a tolerance and VIF for every PV.
- Need to check collinearity for all the PVs.
!"#$%&'($ = 1 − ,!"
1
-./ =
!"#$%&'($
,Multicollinearity test SPSS example
- Analyze > regression > linear > statistics > select collinearity diagnostics
- Adds a set of columns to the coefficients table.
- No collinearity problems:
o All PVs have a tolerance > 0.2 and all PVs also automatically have a VIF < 5
- Implications:
o The regression model is okay an no adaptations are necessary.
Categorical PVs in regression
- What if there are categorical PVs in the data set or a combination of categorical and quantitative predictor variables e.g.
gender, nationality, education etc.
- If included in the regression model without modification, SPSS treats the group codes (e.g. nationality: 0=NL, 1=GER, 2=EN),
as if it were a quantitative scale.
o The regression line will shift depending on the way it is coded.
o But these values only distinguish subgroups, they have no further numerical meaning, especially for nominal data.
o If we want to include categorical PVs in our regression model, we need to modify them.
- We create dummy variables as separate PVs. They represent the subgroups of a categorical PV.
- Recode the original measure for the categorical variable into new sets of variables called dummies.
- They are called “dummies”, because they artificially “stand in” for subgroups
- Dummy variables are categorical variables with two categories, coded by [0] and [1].
- They act like switches that turn various values off (value =0) and on (value=1).
- Number of dummies we create = [# levels categorical PV] – 1
- The group for which no dummy is created is called the reference group.
Creating dummy variables
- Let’s create a dummy for categorical variable “nationality,” which can be used as a PV in subsequent analyses.
- 3 categories: Dutch (originally coded as “0”), German (coded as “1”), English (coded as “2”)
- There are three categories, so we need two dummies (3 categories – 1 = 2 dummies)
- First determine the reference group, usually depends on the context/hypotheses.
- We assign each group with either [0] or [1].
o [1] means we include values from the group represented by the dummy in our calculations. The group that gets the
dummy.
o [0] means we do not. The reference group.
Regression equation with dummy variables example
Research question: Predict Income via Age & Gender, and whether someone is German or Dutch
- Choose Dutch and female as the reference group.
- Variables:
o OV = Annual income (€)
o PV1 = Age (quantitative)
o PV2 = Gender (0 = female, 1 = male).
o PV3 = nat_GER (0 = Dutch or English, 1 = German)
o PV4 = nat_ENG (0 = Dutch or German, 1 = English)
Consider gender: there are two groups and therefore we need one dummy and one reference group.
Gender is the reference group.
- For female participants:
- For male participants:
,Interpretation β coefficient dummies for gender
- Using dummies, the only difference between the two variables is the β
coefficient. This is indicated by the intercept.
- For the reference group, the intercept is 1.
- For the dummy, the intercept is 1 + 2" .
- Interpreting β coefficients:
o For quantitative variables: the change in the OV when the PV increases by 1
unit on the scale.
o For dummy variables: the change in OV when the dummy goes from 0 to 1. This is the same as the difference between
the mean OV for the dummy group and the mean OV for the reference group. *When all other PVs are held constant.
• E.g. difference in income between average scores of males and females.
Consider nationality: there are three groups and therefore we need two dummy variables and one reference group.
Dutch is the reference group.
- For Dutch participants:
- For German participants:
- For English participants:
Interpretation β coefficient dummies for nationality
- All the coefficients are positive and thus the lines are upward sloping.
- No interaction yet: the slopes are the same for all nationalities.
- The difference between the nationalities is the coefficients – determined on the y-
intercept.
- If the coefficients were negative, the lines would be downward sloping and the
English and the German would be below the Dutch.
Creating dummy variables in SPSS
- Transform > Recode into different variables *always keep
original data
- OR File > New > Syntax:
o RECODE nationality (0=0) (1=1) (2=0) INTO nat_GER.
o RECODE nationality (0=0) (1=0) (2=1) INTO nat_ENG.
o EXECUTE.
- Creating dummies does not create changes in the output but rather in the data set.
*If Dutch was a dummy = 1,0,0
Testing β coefficient dummies
- Using regression with income as the outcome variable and age, gender, and nationality as the predictor vairables.
- Consider the coefficients table:
- P-values:
o Age: Age has a significant impact on income. For every year one gets older, on average, income increases.
o Gender: No significant difference between men and women (marginally significant – so would depend on hypothesis)
o Nat_Eng: English participants have significantly higher average incomes than Dutch participants.
o Nat_Ger: The difference in income between Dutch and German participants is not significant.
- Using this dummy model, no conclusions can be drawn about the difference between Germans and others, because it
cannot be tested statistically.
, Moderation in regression
- Conceptually moderation is the same as N-way ANOVA: the effect of one PV on OV
depends on level of another PV, but in this case, one or both of the PVs are
quantitative.
What is some of the beta coefficients were negative?
- β1 is positive: the lines have a positive slope.
- Β2 is negative: Germans below the Dutch
Β3 is positive: the slope for the Germans is even more positive than the Dutch
- If β3 is significant then the β1+ Β3 slope is significantly different from the B1 slope and then there is interaction.
- B1 is negative: the lines have a negative slope.
- B2 is negative: Germans below the Dutch
- B3 is negative: the slope is even more negative for the Germans than for the Dutch.
Moderation in regression SPSS example
- Interaction in N-way ANOVA is automatically calculated (A*B) in SPSS.
- Interaction in regression:
o Manually:
• Create new interaction variables via “compute”
• Product of existing PVs: interaction term = PV1*PV2
• There is a risk of multicollinearity
o Via PROCESS command:
• Custom dialog box
• Menu developed by Hayes
• Also used for mediation
Moderation in regression conceptual example
- With a categorical moderator and a quantitative core PV:
- E.g. explain income via age, with nationality as the moderator.
- Effect of one PV depends on the value of the other PV e.g. effect size (size of β) of age on income depends on nationality.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller bastudent. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.09. You're not tied to anything after your purchase.