Summary of the videos explaining the content of applied microeconometrics. Please note that the stata application is summarized in a seperate document (which is also available).
Summary lectures Applied Microeconometrics
Week 1: introduction
Introduction session
In total three assignments, accounts for 45% of your final grade. Every assignment you will get
another group. Make sure you have watched the videos before going to the question hour that week.
Try to make the sample assignment before exercise lecture 2.
In the third week on Wednesday is the deadline for assignment 1!
Videos: introduction to empirical methods – Linear regression models
Introduction
Empirical analysis is a methodology to use data to test a theory and to estimate a relationship
between variables. For example is there is a theory about the increase of food assumption, we can
use data to test whether this theory is true. We also can test policies whether they work.
The first step is to define your research question clearly. The scientific method is not about the
outcome, if your outcome is negative it still can be useful if your method is right. Your research
question can be based on an economic model or it can be based on your intuition, for example what
you observe in the real world. With existing scientific evidence you can argue your observation. Once
you have defined your research question, you can establish a simple regression model, which
consists of two variables, X and Y. You would explain Y in terms of X. So if we change the value of X,
what kind of effect has that on Y?
Example: house prices and average income in a neighbourhood
We will answer the follow research question: how does the house prices change, when the average
income in the neighbourhood changes? Underneath a plot is made of the data available of all the
neighbourhoods in the Netherlands (represented on the Y-as) and the average income in those
neighbourhoods (represented on the X-as). Each dot represents a neighbourhood.
From this scatterplot we can see
that there is a correlation between
the average income and the house
prices. The higher the average
income, the higher the house
prices are. This is a positive
association. The aim of a
regression model is to find a line
that summarizes all the
information that is shown in the
scatterplot. β0 is the constant of
average income and β1 is the
slope that tells us how the income
changes.
,Here above the formula of the simple regression is shown. The error term is the effect on Y that is
not observed by the researcher. This simple linear regression is center paribus, meaning that we see
how Y changes when X changes, being that all the other factors are held fixed. The ‘other factors’ are
represented in u. this means that Δu = 0, since u is always fixed. So ΔY = β1ΔX, what the
representation of the slope in the regression model is.
Because u does not change, we have a zero conditional mean assumption. E(u|X) = E(u) = 0. This
assumption gives another useful interpretation: E(Y|X) = β0 + β1X. Can we draw ceteris paribus
conclusions about how X affects Y in our example? We can, when we assume that the zero
conditional mean assumption holds (so E(u|X) = E(u) = 0). If we look at our sample, we can say that u
represents amenities (voorzieningen), this means that amenities should be the same regardless of
average income. This would mean that E(amenities | income=10,000) = E(amenities |
income=100,000). Could this be the case in our example? We do not have any data about the
amenities, so we have to use our intuitive about whether this would be logical. If we think that the
amount and quality of amenities is different in richer than in poorer neighbourhoods, then previous
assumption does not hold. We cannot observe u, so we have no way of knowing whether or not
amenities are the same for all levels of X in our data. You should argue based on economics theories
whether this assumption that amenities are the same, could hold.
Estimation and interpretation
We continue with the previous example. We want to know whether our two variables of interest are
related in a linear way. Therefore, we need to estimate β0 and β1. Please note that we will not know
the value of these parameters for sure, we going to estimate them. Whether our estimate is right,
depends on the behind laying assumptions of the model and the availability of good quality data.
First, we need to take a random sample of neighbourhoods. Then, we can plot the X and Y value of
each sample (the dots). Next, we can draw a line which is in between the dots that should represent
the average of the dots.
So the fitted value
represents the Y value that
belongs with the X value
according to our regression.
However, it might be the
case that the real observed
value lies not exact on this
point of the line. The
difference between the real
value of Y and the value we
estimate based on our
regression, is called the
residual.
,So the fitted Y (based on our regression) represents the formula of our regression Y^I = β0 + β1Xi. The
residual can be determined by taking the real observed value Y i and therefrom extract the formula of
the fitted Y. This gives us Y i - β0 - β1Xi. When there is a hat (^) on the y it means we are talking about
the estimated values and the Y without a hat represents the real observed value. Your aim is to have
your residuals as small as possible, so the estimate value lays as close as possible to the real value.
We can do this by minimalizing the sum of the square of the residuals minimum of us Σi=1 (Yi - β0 -
β1Xi)2 this formula is called to Ordinary Least Squares (OLS) estimator. There other options to
minimalize the residuals, however the OLS is the most efficient and unbiased estimator.
Please not you do not have to calculate these values by hand, you are going to use STATA for it! If we
apply reg house price income in STATA with our dataset, we get the following outcome:
The column under Coef. Gives you the information you need for your regression. So the income was
the X value belonging to β1. The _cons is the constant, belonging to β0. This means we get the
following regression formula for our dataset: house price = -95.798 + 16.249 * X (income). The house
price is in 1000 euros and the income in euros. So if the average income in a neighbourhood
increases with 1 euro, the house price will increase with 16.249,00 ceteris paribus. Together with the
constant, we get a estimated house price given a certain income. You can interpretate the constant
as well, but in this example it does not make sense since the constant shows a negative house price
and the average income cannot be zero. Once you have your regression, you can plot the fitted line.
You will see which observation fall on the fitted line of our predictions.
The thing with a simple regression is that it is difficult to draw ceteris paribus conclusion. Now we
only looked at the effect of income on the house price, but it is also possible that density (dichtheid)
has an influence on the house price. A simple regression would only give a ceteris paribus
relationship, if the other variables incorporated in u (error term) are not correlated with our
explanatory variables (X). If density would be correlated with income, the simple regression of
income on house price will not show a ceteris paribus relationship. If it is the case that richer
neighbourhoods are located in less populated areas, this means we do not meet the zero conditional
mean assumption, because u would change. We then cannot draw a ceteris paribus conclusion. In
that case it would be better to create a multiple regression which includes income and density as
explanatory variables Y = β0 + β1X1 + β2X2 + u
, Everything we have discussed so far also applies to extending our X’s. We can use OLS in the same
way and can use the same interpretation method. In case more variables have an effect on Y and are
correlated with β1, it is better (if you have the data) to add these variables as well to be more sure
about your regression and to draw ceteris paribus conclusions. At the end, multiple regression
analysis allow us to control for many other variables that simultaneously affect the dependent
variable. Furthermore, it will contribute to better predictions.
Assumptions for unbiasedness
With an unbiased OLS we mean that the expected value of our estimator is equal to the population
parameter. So E( β^0) = β0 and E( β^1) = β1. This means if we would take the observed values in the
population, that these values will be the same as the estimated values. Which assumptions do we
need to meet this property? We need to assume four things:
1) Linear in parameters
- In the population model, the dependent variable Y is related to the independent variable X
and the error term.
- Y = β0 + β1X1 + β1* β2X2 + u would not be possible, since an interaction between two
variables would no be linear.
- However, there can be some nonlinearities in the variables such as:
Y = β0 + β1 In(X1) + β2X2 + β2X22 + u
In(Y) = β0 + β1X1 + β2X2 + + β2X22 + u when you are interpretating such formulas you
should keep in mind the log when reading the results
2) Random sampling
- We have a random sample size of n (non specific selected sample), following the population
model is assumption 1.
- If we for example would only select obese people, we would have selection bias, not
representing the whole population.
3) Sample variation in the explanatory variable / no perfect collinearity
- The sample outcomes on x are not all the same value. So none of the independent variables
can be constant, and;
- there are no exact linear relationships among the independent variables.
- We use the variation to estimate the effect of variable X on variable Y, if X would be a
constant then changing X is not possible and we cannot see what kind of effect changing X
would have on Y.
Example of assumption 3
Let’s say we have the following multiple regression:
If only Elderly would live in Rotterdam, there would be an exact linear relationship between the
variable Rotterdam and Elderly. The would cap the same variation. Another example would be of we
have a variable income * 1000 and a variable income / 1000. This would capture both income, but we
only need one variable representing income. The values of these two income variables would be
different, however they would have the same variation. In general, perfect collinearity between X1,
X2 and X3 would exist if X3 = a.X1 +b.X2
So we can have two situation, we can have perfect collinearity or imperfect collinearity.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Lawandeco. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.68. You're not tied to anything after your purchase.