Comprehensive summary of all lectures Quantitative Methods divided into the six themes that were also used during the lectures. Includes examples and screenshots from the lectures to clearly describe theory.
Samenvatting Quantitative Methods:
Theme 1: Intro, Variables and techniques, OLS
Data: a matrix of different observations.
Observation: the unit of analysis (people, organisations) and the collected data, the variables (age,
gender etc.)
SPSS is a program where you are looking for relationships between these variables. Looking for the
‘ideal’ model.
Dependent variable (or response variable): a variable whose value depends on that of another. A
variable thought to be affected by changes in an independent variable. You can think of this variable
as an outcome.
Independent variable (or explanatory variable): a variable thought to be the cause of some effect.
This term is usually used in experimental research to describe a variable that the experimenter has
manipulated. Whose variation does not depend on that of another.
Manifest variable: directly observable variables for which we collect data (gender, income). You can
directly use it in variables.
Latent variable: latent means that something is not observable directly (for example globalisation).
Level of measurement:
Nominal: is also known as categorical or qualitative (colour, type of chocolate). When two
things that are equivalent in some sense are given the same name (or number), but there are
more than two possibilities.
Ordinal: ordinal variables have a meaningful order but the intervals between the values in
the scale may not be equal (rank, satisfaction, fanciness). Example: smaller difference
between ‘very satisfied’ and ‘satisfied’ but a bigger difference between ‘satisfied’ and
‘unsatisfied’. When categories are ordered.
Interval/ratio: this label includes things that can be measured rather than classified or order
such as number of customers. Interval ratio data is also known as scale, quantitative or
parametric. The ratios of values along the scale should be meaningful. For this to be true, the
scale must have a true and meaningful zero point.
Linear regression analysis (OLS):
Interval ratio
You are trying to explain a particular (dependent = Y) variable (like housing prices).
,For example: We want to know how the average income differences when we change gender by one
unit.
A linear regression is adaptive.
The dots in a model are the combinations of variables. For each observations you have a
unique combinations of variables (the dots).
There is a linear line in the model that approximates these observations as good as possible.
Variation: is the variation between all the dots.
R-squared (goodness of fit): defines how well this model fits the observed data. R2 represents the
amount of variance in the outcome explained by the model relative how much variation there was to
explain in the first place. You want to obtain a model that is as close as possible by the observations.
You hope that your model predicts a variation as good as possible. That would be a model where the
dots (the observations) are perfectly on the line. In that case you have a r-square of 100%.
Check model assumptions: to check if the model is ‘good’
1. The sample consists of independent observations: you need to insure that the observations
are independent from each other (so for example not collaborating in the survey).
2. A linear model is suitable, that is, the relationship between the dependent variable and the
independent variable is linear: we need to check the linearity assumption.
Model A & C are correct as a linear model. In model A there is a linearity but there is a spread (in
answers). C is correct because there is a linearity in the line and there is an equal quality of
predictions.
3. The variance of the residuals is equal for alle possible values of the independent variables
(constant variance or homoscedasticity): the variance is constant which means that the
observations that we have, need to be around the ‘zero line’. This is important because when
the dependent variable becomes higher, the lower the reliable predictions will become.
4. The residuals are normally distributed: there needs to be a standard deviations. 2/3 of the
observations needs to be in the standard deviations. This is a very important point whether
we want to draw conclusions.
NL: residuals = verschillen tussen de waargenomen en de door regressie analyse verkregen voorspelde waarden
van een kansvariabele.
,Residuals: The differences between what the model predicts and the observed data in a linear model
(same as deviations). Sometimes the residuals will give an error. To asses the error in a linear model
we use the sum of squared residuals. The residuals sum of square is a measure of how well a linear
model fits the data. If the squared differences are large, the model is not representative of the data
(there is a lot of error in prediction); if the squared differences are small, the line is representative.
Outliers: extreme observations. It is extremely different than the rest. There are problematic because
you try to identify the relation between the dependent and the independent variable and outliers
can show another (not realistic) view on it.
Detect outliers:
- Look at the observations beyond three standard deviations of mean
- Boxplots, histograms, probability plots, scatter plots
Study impact of influential cases:
- Idea is to compare regression outcomes with and without influential cases
- SPSS: influence of case on individual coefficients (DFBETA) and on the model fit (DFFIT)
- Influential cases with Cook’s distance > 1
Multicollinearity: the problem where the correlation between two (or more) explanatory
(dependent) variables is too high ( R < 0,8 or 0,9). If this is so high you can not identify effects
individually from each other.
Problems:
- Standard errors of regression coefficients increase untrustworthy coefficients
- Limits size of R
- Interpretation of relevance of individual explanatory variables becomes impossible
Dummy variables: are categorical variables that have two values (men and women). It is a value that
takes value 0 or 1.
When do you need a dummy variable?
Continuous Not necessary
Ordinal Not necessary if linear trend exists, otherwise yes
Dichotomous (men/women) Yes
Nominal (more than 2 categories) Create help variables using dummies (number of dummies =
number of categories minus 1)
Interaction variable: we speak of an interaction if the effect of an independent variable is influences
by a second independent variable.
Example: the effect of study hours on grade is different for students with a high level of prior
education than for students with a low level of prior education (the dummy high = 0, low = 1),
, however the hours of study is different for students with low education than for students with high
education. The effects are not parallel anymore. That is where the interaction variables are coming
in.
Overview conditions of linear regression with OLS:
1. The sample consists of independent observations
2. A linear model is suitable, that is, the relationship between the dependent variable and the
independent variable is linear.
3. The variance of the residuals is equal for all possible values of the independent variables
(constant variance or homoscedasticity)
4. The residuals are normally distributed
Linear regression models that predict non-metric dependent variables fail to meet these
conditions. Therefore we use non-linear regression models… the discrete choice model next
theme
Theme 2: Discrete Choice Model
Basic terms:
Hypothesis testing: We do this all the time, in this course we usually use the T-test, who test a
hypothesis were the coefficient is 0, which we hope to reject. Because we want an explanatory
variable that has an effect on the dependent variable.
Model building is just the case were we are adding variables and hope that these variables explain
the dependent variable.
In social science we take the 5% significant level. You are always checking if the P-value is below 5%.
Than you can reject the hypothesis.
F-test: a test for the overall model significance, if the overall model is not significant the model is
useless. There is no significant evidence that this model helps to predict the dependent variable at
all.
We hope that at least one variable is not equal to 0 (in the coefficient), because than this one
explains the dependent variable.
Normal distribution:
It is a requirement to use hypothesis testing
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller annick51. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.38. You're not tied to anything after your purchase.