This is a summary of all the lectures except for the article lectures. Content from the reader is also included in this and there are 3 pages of practice exam questions and possible exam questions. GRADE: 8,4
Lecture 1: Introduction, data collection, variable types and methods of
analysis, OLS (conditions, pragmatism and justification)
Block 1: General considerations of quantitative methods
Variable types and methods of analysis
- Response variable (dependent variable) vs explanatory variable (independent variable)
- Manifest variable (directly observable variables for which we collect data) vs latent variable
(not directly observable, f. ex. Globalization)
- Nominal = categorical, qualitative -> no sense of order, no mean -> sex, color
- Ordinal = rank, satisfaction, fanciness -> order but not the same difference
- Interval ratio = things that can be measured -> weight, age
- Levels of measurement: n = frequencies and proportions o = frequencies and proportions,
sometimes mean i/r = mean, median, standard deviation
- Graphical representation: n = pie chart, bar chart, column chart o = bar chart, column chart
i/r = bar chart, histogram, boxplot, line chart
Block 2: Recap linear regression analysis (if dependent variable is metric -> Interval+Ratio)
- LRM is additive: all the effects are adding on top of each other
-
- By isolating other factors, you can look at the effect of 1 variable
- The linear regression line is estimated with help of the least squares method: take the line,
for which the sum of squared residuals is as small as possible.
- R-squared: a prediction based on the estimated parameters
- The residual is the deviation between the prediction and observation
- R-square(goodness-of-fit) measures how well the model fits the ovservations, the share of
the variation of Y that is explained by the model
o Poor model = 0% prediction -> linear line with observations in two horizontal lines
o Perfect model = 100% prediction
- Check model assumptions
o The sample consists of independent observations -> this is looked after during the
data collection
o A linear model is suitable, that is, the relationship between the dependent and the
independent variable is linear
1
, Spread is increasing, Negative residual -> Good range, equal
but linear predictions too low or quality predictions
high
o The variance of the residuals is equal for all possible values of the independent
variables (constant variance or homoscedasticity) -> the residuals observation needs
to be around the 0-line throughout the spectrum, otherwise the tests are unreliable
for a certain range.
o Residuals are normally distributed -> bell shape, mean should be 0 (otherwise
systematic problem)
- Outlier = observation that’s extremely different than the rest -> problematic because they
tend to shift your measured linear line in a wrong direction
o Detect outliers: look at observations beyond 3 standard deviations of the mean and
visualize with boxplots, histograms, probability plots and scatter plots.
o Study impact of influential cases: Compare regression outcomes with and without
influential cases, find out how big the impact is on your overall model fit (DFBETA
and DFFIT) and check if Cook’s distance is > 1
- Multicollinearity = correlations between too or more explanatory variables is too high (r < 0.8
or 0.9) -> in this case you can’t identify the individual effects anymore.
o Problem: it increases standard errors of regression coefficients, it limits the overall
model fit (R) and the interpretation of relevance of individual explanatory variables
becomes impassible.
o Rules of thumb for detection: VIF > 10 (or tolerance < 0.1) -> indicates serious
problems of multicollinearity. VIF substantially higher than 1 (or tolerance < 0.2) ->
multicollinearity may be a problem
Block 3: Linear regression: Model extensions and alternative model specifications
1. Dummy variables -> are categorical with value 0 or 1 -> are to conclude qualitative variables
in regression
- Produces two linear lines that are allows to have a constant difference in alpha
2. Interaction variables -> if the effect of an independent variable is influences by another
independent variable. In the linear model an interaction term is added (Multiplicative) -> if
the interaction term is significant, the regression lines will not be parallel.
3. What to do in case of non-linearity
- Add a non-linear term -> quadratic regression model
- Transform the variables -> logarithm, square root, reciprocal of number
- Other model specifications (second lecture)
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller bestsummaries. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.96. You're not tied to anything after your purchase.