The "Research Methods" summary is a comprehensive guide covering six weeks of theoretical concepts and practical instructions for using R. From regression basics to advanced statistical techniques like MANOVA, this condensed resource is an essential tool for exam preparation, offering everything yo...
Week 1 Regression Introduction and Multiple Regression
Linear regression: Y = a + b * x
Residuals = the difference between each observation and the model fitted to the
data (all observations)
Least squares method = a method to find the best fitting line, that reduces the error
(residuals) as much as possible. The line with the smallest possible residuals.
R-squared = the proportion of variance in the dependent variable that can be
explained by the independent variable. Shows how well the data fit in the regression
model. Number between 0 and 1, 1 is the best possible model in which all variance in
the dependent variable is explained.
F-test = tests whether the independent variables together explain a significant part of
the variance of Y.
T-test = tests whether a predictor variable makes a significant contribution to the
regression model. Provides some idea of how well a predictor predicts the outcome
variable.
T = b / SE, check the critical value (1.96, 1.645) for significance. Smaller? Not
significant
When we compare the effect of different variables in the same regression, we use
the standardized coefficient (beta). Beta is independent measuring unit.
Standardized X and Y: mean is 0 and SD is 1. The constant is always 0.
A multiple linear regression has always only one outcome variable and there can
be more than 2 predictors.
Statements:
- The sample is representative of its population and the observations are
independent
- There exists a linear relationship between the independent variables and the
dependent variable
- The residuals follow the normal distribution
Multicollinearity: when there are strong correlations between two or more predictors
in a regression model.
Variance Inflation Factor (VIF): a measure of the amount of multicollinearity in a set
of multiple regression variables. Rule:
Tolerance < 0.1 and VIF > 10, there is multicollinearity.
,Regression in R
Use a Chi-squared test to check if there is a significant relationship between two
variables.
F = mean square regression / mean square residual (get these numbers from the
output of ANOVA)
, Week 2 Regression with categorical predictors and Regression assumptions
Dummy variable: a variable that can take two values, 1 (presence of an attribute) and
0 (absence of an attribute)
For a variable with k categories, we make k-1 dummy variables.
To test the difference between these two dummies, use a t-test. The p-value is
(almost) the same as the difference.
Reference category is coded with 0 in all dummies. You choose the category that
makes sense to be a reference or the control condition.
Steps to check if a categorical variable is important:
- Run two model (one without the variable, one with)
- Look at the R-squared in both models
- Compare the R-squared with a F-test
- Check if this is significant. If it’s not, it does not increase the proportion of
variance of y explained by the regression model
Regression with categorical variables (dummies) is the same as one-way ANOVA.
Hierarchical regression = in each step the researcher adds one (or more) variables.
Multiple regressions where you add new things each time. This step should
theoretically make sense: does the R-squared increase in each step?
Regression assumptions: when can you generalize the results?
- Linearity, the relationship between X and U must be linear for each value of
the other X’s.
Check with scatterplot (bivariate regression) or residuals plot (multiple
regression). The red line should be approximately on the 0-line.
- Effect of normality of residuals, the distribution is normal.
Central limit theorem = parameters are normally distributed if you have a large
enough sample
Check with histogram.
- Homoscedasticity, the spread of residues stays the same across all values of
Y.
Check with residuals plot/graph, randomly scattered in the plot.
- Independence of residuals/errors, residues from the various observations
are correlated
Check with Durbin-Watson test. Test values <1 or >3 indicate autocorrelation.
Equal to 2 = no autocorrelation.
If linearity is violated, homoscedasticity is (usually) also violated.
Assumptions violated:
- Linearity, when there is a pattern of deviation from the 0 line in the plot.
Regression coefficients cannot be trusted
- Normality, histogram. Not normal is not a problem if the sample is large
enough (>30).
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper isiswoutersen. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.