Summary Quantitative Methods 2021
Lecture 1: Introduction, data collection, variable types and methods of
analysis, OLS (conditions, pragmatism and justification)
Block 1: General considerations of quantitative methods
Variable types and methods of analysis
- Response variable (dependent variable) vs explanatory variable (independent variable)
- Manifest variable (directly observable variables for which we collect data) vs latent variable
(not directly observable, f. ex. Globalization)
- Nominal = categorical, qualitative -> no sense of order, no mean -> sex, color
- Ordinal = rank, satisfaction, fanciness -> order but not the same difference
- Interval ratio = things that can be measured -> weight, age
- Levels of measurement: n = frequencies and proportions o = frequencies and proportions,
sometimes mean i/r = mean, median, standard deviation
- Graphical representation: n = pie chart, bar chart, column chart o = bar chart, column chart
i/r = bar chart, histogram, boxplot, line chart
Block 2: Recap linear regression analysis (if dependent variable is metric -> Interval+Ratio)
- LRM is additive: all the effects are adding on top of each other
-
- By isolating other factors, you can look at the effect of 1 variable
- The linear regression line is estimated with help of the least squares method: take the line,
for which the sum of squared residuals is as small as possible.
- R-squared: a prediction based on the estimated parameters
- The residual is the deviation between the prediction and observation
- R-square(goodness-of-fit) measures how well the model fits the ovservations, the share of
the variation of Y that is explained by the model
o Poor model = 0% prediction -> linear line with observations in two horizontal lines
o Perfect model = 100% prediction
- Check model assumptions
o The sample consists of independent observations -> this is looked after during the
data collection
o A linear model is suitable, that is, the relationship between the dependent and the
independent variable is linear
1
, Spread is increasing, Negative residual -> Good range, equal
but linear predictions too low or quality predictions
high
o The variance of the residuals is equal for all possible values of the independent
variables (constant variance or homoscedasticity) -> the residuals observation needs
to be around the 0-line throughout the spectrum, otherwise the tests are unreliable
for a certain range.
o Residuals are normally distributed -> bell shape, mean should be 0 (otherwise
systematic problem)
- Outlier = observation that’s extremely different than the rest -> problematic because they
tend to shift your measured linear line in a wrong direction
o Detect outliers: look at observations beyond 3 standard deviations of the mean and
visualize with boxplots, histograms, probability plots and scatter plots.
o Study impact of influential cases: Compare regression outcomes with and without
influential cases, find out how big the impact is on your overall model fit (DFBETA
and DFFIT) and check if Cook’s distance is > 1
- Multicollinearity = correlations between too or more explanatory variables is too high (r < 0.8
or 0.9) -> in this case you can’t identify the individual effects anymore.
o Problem: it increases standard errors of regression coefficients, it limits the overall
model fit (R) and the interpretation of relevance of individual explanatory variables
becomes impassible.
o Rules of thumb for detection: VIF > 10 (or tolerance < 0.1) -> indicates serious
problems of multicollinearity. VIF substantially higher than 1 (or tolerance < 0.2) ->
multicollinearity may be a problem
Block 3: Linear regression: Model extensions and alternative model specifications
1. Dummy variables -> are categorical with value 0 or 1 -> are to conclude qualitative variables
in regression
- Produces two linear lines that are allows to have a constant difference in alpha
2. Interaction variables -> if the effect of an independent variable is influences by another
independent variable. In the linear model an interaction term is added (Multiplicative) -> if
the interaction term is significant, the regression lines will not be parallel.
3. What to do in case of non-linearity
- Add a non-linear term -> quadratic regression model
- Transform the variables -> logarithm, square root, reciprocal of number
- Other model specifications (second lecture)
2