100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Stata application videos applied microeconometrics €7,99   In winkelwagen

College aantekeningen

Summary Stata application videos applied microeconometrics

1 beoordeling
 64 keer bekeken  6 keer verkocht

This is the summary of the stata application part of applied microeconometrics. Together with the theoretical summary I got an average of 8,5 on my exam and assignments.

Voorbeeld 4 van de 42  pagina's

  • 16 januari 2022
  • 42
  • 2021/2022
  • College aantekeningen
  • Carlos & pilar
  • Alle colleges
Alle documenten voor dit vak (2)

1  beoordeling

review-writer-avatar

Door: michellelaurensse • 1 jaar geleden

avatar-seller
Lawandeco
Summary STATA application (applied eco)
Week 1: Linear regression models
Describing data
Always look at the data before starting the regression. You can do this by using the browse
command. Moreover, there are three types of commands you can use to describe you data:
- Describe: here we can see the total observations, the type of variables we have and the
variable label.
- Summarize: here we see the observations per variable, the mean, standard deviation, min and
max
- Tabulate: here we see the frequencies of categorical variables (so the distribution of the
value), the percentages and the cumulative percentage. This command you must use
separately per variable, otherwise you get crossed results.

Interpretation quantitative variables
To rename variables, you can use the command rename. When making a regression, we need to use ,
robust at the end, to make sure we have heteroskedasticity. So we get the command: reg dependent
variable independent vairable1 independent variable 2, robust. When using this OLS regression
command, we see the number of observations, the F test that shows us the significance regarding
the H0 that all variables are equal to zero, the R squared (from the sum of residuals) and the root MSE
(we do not use that).

Interpretation normal regression
The p-value shows us the significance of H 0, saying the coefficient of the variable is equal to zero.
How to we interpret the coefficient? We must say that one additional unit of X increases Y by β1
units, ceteris paribus. In our example assignment (regarding house price and income in 1000 euros),
we say that when the income increases with 1000 euros, the house price increases by 16.250 euros,
ceteris paribus. If we look at the p-value, we often use a value of 1, 5 or 10%. Which in numbers are
0.001, 0.05 and 0.10. When we look at the standard error, we see that how larger the standard error
is, the more uncertainty there is about our coefficients. This means our confidence interval is also
going to be very large. The confidence interval tells us that when we run a lot of sample form the
population, the true parameter would fall within this confidence interval 95% of the cases. If the
value we want to test falls within the 95% confidence interval we cannot reject our hypothesis. If it
falls outside the confidence interval we cannot reject the hypothesis.

Example confidence interval
If we have for example a confidence interval of 15.46 – 17.02 and we want to test the value 16, we
know this falls withing the confidence interval, so we cannot reject our hypothesis. If we want to test
15.4, this falls outside the confidence interval and we can reject our hypothesis that the coefficient is
equal to 15.4. You could also test this value, by using the command test after running the regression,
let’s say we want to test whether the coefficient of income is equal to 15.4, we say test income=15.4.
we then get a F statistic showing whether it is significant or not.

,Interpretation log-log model
Our linear models do not allow non-linearity in the parameters, however non-linear relations are
allowed in the variables. You could use for example a log-log variable, to estimate elasticity. First you
need to generate the new variable containing log by using the gen command and then reg
ln(dependent variable) ln(independent variable), robust.

How do we interpret these results form a log-log model (so the dependent and independent
variables are shown in log)? Then our interpretation would be that a 1% increase in x increases y by
β1%, ceteris paribus (which shows elasticity). Do not call x and y here log x and y! So in our example
we would say that when income increases with 1%, the houseprice increases with 1.24%. the
confidence interval and testing for certain values of the coefficient is the same as with a normal
regression.

So when talking about logs (when everything is in log), we are talking about percentages. When
everything is normal, we talk about units of the variables.

Interpretation model log-level
Here we have the dependent variable in log and the explanatory variables in levels (so normal). The
interpretation in this case is that one additional unit of X increases y by [100*(exp( β1)-1)]%, ceteris
paribus. We need to apply this formula using the command display ourselves! In this case we have
semi-elasticity. So in our example it would be that when the average income in the neighbourhood
increases with 1000 euros, y increases by 5%, ceteris paribus. If X changes with more then 1, we need
to apply [100*(exp(β1*D)-1)]% with D being the change in X. If we want X to increase with 25000
euros, we would get [100*(exp(β1*25)-1)]%.

In stata you need to use the following command + formula: display 100*(exp(_b[variable x])-1)

We use the words average income and average house price in this case because the data contains
average values! So look carefully at how the variables are described.

Interpretation model level-log
The dependent variable is here in level (so normal) and our explanatory variable is in log. Here we
need a transformation to estimate the coefficient as well. Here we need to use (β1)*ln(1.01). The
interpretation is that a 1% increase in X increases y by (β1)*ln(1.01) units, ceteris paribus. So the
variable we have in log is in % and we translate it to units again. In our example we would get that
1% increases in the average income in the neighbourhood, increases the average houseprice in the
neighbourhood by 3800 euros, ceteris paribus.

If you want to use another increase of X you adjust the ln(1.01). So for example 10% would be ln(1.1).
But, with larger increases in X the approximation works less well. If x1 changes to x2, y changes with
β1*[ln(X2)-ln(X1)] units. A p% increase in X, changes Y by (β1*ln(1+p/100)).

Interpretation of squared variables
Now we transform our independent variable to a squared variable. When using a squared value, you
also need to include the original value. Otherwise if you use income 2 but there is no income included
in your model, it thinks income2 is zero. When interpreting these results, you have to look at both
coefficients. Because both variable capture the underlying variable, so you have to interpret them
jointly. If you look at the scatterplot you get when graphing these results, you see a exponential line.

,Meaning the change in average house price is not the same for each increase in income (the slope
will be different). So you should say that the effect of an increase in the average income on the
average house price dependents on where you are in the income distribution.

You can compute the estimated average effect by using summarize income and then using display
[_b[income]+2*_b[income2]*r[mean]].

Please note that when you use the predict command, you do this after the model you regressed for
which you want to make a prediction! You need to use predict houseprice_predicted. If you make a
scatterplot afterwards with the command scatter variable_predicted, you get a scatterplot containing
the predicted values (so it also contains values outside of your sample).

Interpretation binary and categorical variables
Binary dummy
If you make a dummy variable, it can either take the value 0 or 1. If we interpret the coefficient of
our dummy variable, it shows us the difference in effect of whether our dummy is 1 or 0. If we have a
regression with houseprice as dependent variable and income and Rotterdam (which is the dummy)
as independent variables, we get the following interpretation. We get the average houseprice in
Rotterdam compared to the rest of the Netherlands, ceteris paribus. If the coefficient of Rotterdam is
-75, we say that the average houseprice in Rotterdam is 75000 euros lower than the average
houseprice in another area in the Netherlands, ceteris paribus. We see this coefficient of -75 is
significant and 0 does not fall withing the 95% confidence interval as well. So, the difference is
statistically significant at a 1% level.

The constant in this case gives us the average house price when all the x’s are equal to zero. This
means that the average income in the neighbourhood is equal to zero and that the dummy variable
contains the value zero (so the neighbourhood is not in Rotterdam). We get a negative house price,
but this is due to the fact it is not logical that income would be zero.

We can get a scatterplot of the predicted value of houseprice again. In this plot however, we get two
diagonal parallel lines. We get two lines because one line represents the average houseprice given a
certain income in Rotterdam (so dummy=1) and the other line represents the average houseprice
given a certain income in other neighbourhoods (so dummy=0). We know the intercept of Rotterdam
is lower, so the bottom line is Rotterdam. We see that the lines are parallel, so the slopes are the
same. This means that the effect of income on the average houseprice is the same in Rotterdam and
in other neighbourhoods.

But what if we want to test whether the effect of income on houseprice is different for other areas?
In that case we need to add an interaction term. We can do so by first generating a new variable
which is gen income_Rdam=income*Rotterdam. When Rotterdam takes value 0, our income_Rdam
will be zero as well. If Rotterdam=1, then we have 1 * income, which is the same value as for income.
Now we are going to regress income and income_Rdam together. We get a different scatterplot in
that case, we do not have two parallel lines anymore. The increase in the average income here has a
different effect on the increase of houseprice.

In our regression we get the following results:

, E(houseprice|income, Rotterdam)= 16.3 * income + 9.5 – 4*income-95 = 12.3 * income – 86.5  so
these is our new slope of our houseprice for Rotterdam, if the average income increases with 1000
euros, the average houseprice in Rotterdam increases with 12.300 euros. In other areas the average
houseprice increases with 16.3 when income increases with 1, so this increase is larger. The
significance of the coefficient of the interaction term gives us the information whether the difference
between Rotterdam or other neighbourhoods is significant or not. We see here a p-value of 0.001,
meaning the difference between average houseprice in Rotterdam or other areas is not equal to
zero. We also can conclude this by looking at the confidence interval, here we see zero lies not within
the interval. So the different effect between Rotterdam or other areas is statistically significant.

When the line in the scatterplot is flatter, we know the coefficient will contain a smaller increase or
decrease.

Categorical variables
We also can make a categorical variable by generating other=Rotterdam==0 if Rotterdam!=.. So then
we have Rotterdam and other as variable to represent about which area we are talking. If we use
other instead of Rotterdam in our regression, we will see we get the same results, however the sign
in front of the numbers changed. So this causes the constant also to change. So when using
Rotterdam we see the average houseprice is 75 lower in Rotterdam, compared to other areas, ceteris
paribus. While using other, we see that the average houseprice is 75 higher in other areas, compared
to Rotterdam, ceteris paribus. The difference in constant is exactly the 75.

We cannot include both categorical variables at the same time! Because stata will drop one variable,
since we have a omitted variables in that case (collinearity). We need a reference categorical
variable. We can use the command noncons at the end (so reg Y X1 X2, robust noncons) then stata
will not drop Rotterdam or Other. However, one variable now will be equal to the constant. This is
because if we would sum Rotterdam + other, we get 1. We then are multiplying the coefficient with
value 1. But it is better not to use this, since you cannot see the outcome immediately, you need to
separately test whether Rotterdam and other are the same.

Most of the time we have a categorical variable with more categories then 2. You always need to
exclude when categorical variable in your regression, this is the reference categorical. So if you
interpret your results, you always compare them to the reference category! If you change your
category result, your interpretation will change as well. Because your coefficient will change as well.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Lawandeco. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 78252 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,99  6x  verkocht
  • (1)
  Kopen