Looking for a comprehensive summary of Methods: Econometrics 2 that will help you ace your course? Look no further! This summary covers all the essential materials needed to succeed in the course, including notes from the lectures, knowledge clips, and mandatory literature, such as papers and book ...
Treatment heterogeneity
Introduction
Polynomials and transformations are great for allowing the relationship between Y and X to
be flexible. Any sort of curvy line can be fitted so that the relationship between Y and X can
change depending on the value of X. However, what if the relationship between Y and X differs
not based on the value of X, but based on the value of a different variable Z? Variable Z allows
the relationship between Y and X to vary with the value of some other value. If we for example
look at the relationship between the price of gas and how much an individual chooses to drive,
then the relationship differs depending on whether someone owns a car (=Z). The extent to
which subjects respond differently to a treatment is called treatment effect heterogeneity. The
effect of X on Y, depending on Z, can not be represented by a polynomial or transformation.
We will need to use interaction terms. If you think that the relationship between Y and X is
different for different values of Z, or in other words when the treatment effect varies among
groups, then simply include X * Z in your model, as well as Z by itself. For a fully interacted
model, the regression equation (with cross-sectional data) looks as follows. You should
include 𝑍𝑖 since you otherwise force the same intercept on the two groups, affecting the
estimated slopes.
𝑌𝑖 = 𝛼0 + 𝛼1 𝑋𝑖 + 𝛼2 𝑋𝑖 𝑍𝑖 + 𝛼3 𝑍𝑖 + 𝜀𝑖
Interpretation
The real difficulty with interaction terms is interpreting them. The two important questions when
including interaction terms are i) what is the effect of a variable X when there’s an interaction
between X and something else in the model? ii) how can I interpret the interaction term?
Relating to the first question, some simple math can give us some useful insights. We can
take the partial derivative of the regression equation above, concerning X.
𝜕𝑌
= 𝛼1 + 𝛼2 𝑍
𝜕𝑋
The calculation above implies that 𝛼1 no longer gives us “the effect of X”, and that there is no
single effect of X. Rather, we can now say what the effect of X is at a given value of Z. So if
you’re looking at effect sizes, you need to consider the entire effect-of-X expression. Just
looking at 𝛼1 alone tells you very little. The second question relates to 𝛼2 , which can be defined
as how much stronger the effect of X on Y gets when Z increases by one unit. Often the
variable being interacted with is binary. To illustrate what happens when one includes a binary
interaction term we will use the variable Binary. If the effect of X on Y is 𝛼1 + 𝛼2 𝐵𝑖𝑛𝑎𝑟𝑦, then
𝛼1 is the effect of X on Y when Binary = 0. To get the effect of X on Y for Binary = 1 we simply
calculate 𝛼1 + 𝛼2 1 = 𝛼1 + 𝛼2 . If we find a statistically / economically significant coefficient on
𝛼2 , then we can say that “the effect of X on Y differs between the two groups (Binary=0/1)”
Keep in mind
There are several things to keep in mind when it comes to using interaction terms i) think very
carefully about why you are including a given interaction term, by fishing around with a bunch
of interaction terms you would get many false positives ii) even if we do have a strong idea of
a difference in effect that might be there, interaction terms are noisy since they’re looking for
the difference between two noisy things and this implies that the result will be even noisier.
1
,Allcott (2011)
Information treatment → people receive so-called home energy reports letters (HERs), where
their energy usage is compared to the usage of their neighbours. If pre-treatment use is
relatively high, then you compare unfavourably, which may prompt you to lower energy usage,
the effect could occur through two channels i) social learning: my level of usage deviates from
others (surprise effect); it is suboptimally high because I like to conform to the norm =
descriptive norm ii) moral cost/reward: separate judgment of whether you are doing the right
thing or not, also called ‘injunctive norm’, it’s indicated by smileys on the right of the letter. On
the other hand, if pre-treatment usage is relatively low, then you compare favourably, which
may prompt you to in-/decrease usage again due to two possible channels i) social learning:
my level of energy usage is suboptimally low ii) moral cost/reward: rewarded for doing well.
So in this experiment, the treatment effect varies with some pre-treatment characteristic. Pre-
treatment energy usage is thus a moderator that affects the ‘strength’ of the effect of the letter
(X) on energy usage after a letter (Y).
Ignoring heterogeneity
The possible consequences of ignoring heterogeneity are
i) Generating false negatives (type ii errors)
→ no evidence for an effect but if we include covariates there could be an effect found
ii) Not enabling targeting of treatment
iii) Less idea of the mechanism that is at work
→ variation in response to treatment may tell us something about why the treatment works
We can also look at the consequences of ignoring heterogeneity when dealing with panel data.
Remember the two-way fixed effect (TWFE) diff-in-diff estimator
𝑌𝑖𝑡 = 𝛼 + 𝛿𝑃𝑜𝑙𝑖𝑐𝑦𝑖𝑡 + 𝛽𝑖 + 𝛾𝑡 + 𝜀𝑖𝑡
If the treatment response is heterogenous, 𝛿 can be incorrect. 𝑃𝑜𝑙𝑖𝑐𝑦𝑖𝑡 is now a continuous
rather than dichotomous variable → so between 0 and 1 instead of just 0 or 1. Thus, in some
settings treatment heterogeneity can lead to incorrect coefficients. In conclusion, the higher
the level of treatment heterogeneity, the lower the statistical power of your study.
Ex-post: treatment-by-covariate interaction
You have run your experiment and see that the treatment effect varies across different sub-
groups. Let’s start with cross-sectional data (no time dimension), after including the interaction
terms, the regression equation looks as followed
𝑌𝑖 = 𝛼0 + 𝛼1 𝑇𝑖 + 𝛼2 𝑇𝑖 × 𝐻𝐼𝐺𝐻𝑖 + 𝛼3 𝐻𝐼𝐺𝐻𝑖 + 𝜀𝑖
Where 𝐻𝐼𝐺𝐻 denotes the interaction term. 𝛼2 𝑇𝑖 × 𝐻𝐼𝐺𝐻𝑖 allows for a different slope when
𝐻𝐼𝐺𝐻 = 1 and 𝛼3 𝐻𝐼𝐺𝐻𝑖 allows for a different intercept when 𝐻𝐼𝐺𝐻 = 1. After including the
interaction terms we can speak of a so-called fully interacted model. The effect of treatment
now depends on the variable 𝐻𝐼𝐺𝐻 because i) if 𝐻𝐼𝐺𝐻 = 0, conditional average treatment
effect (CATE) = 𝛼1 ii) if 𝐻𝐼𝐺𝐻 = 1, conditional average treatment effect = 𝛼1 + 𝛼2 . 𝛼2 is the
interaction effect, also called the difference in marginal effect of treatment relative to the
reference group. Remember the regression discontinuity design? In the equation below the
first coefficient also allows for a different intercept after the cut-off, and the second coefficient
allows for a different slope after the cut-off.
𝑌𝑖 = 𝛽0 + 𝛽1 𝑅𝑖 + 𝛽2 𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖 + 𝛽3 𝑅𝑖 𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖 + 𝜀𝑖
2
, Where the marginal effect is equal to 𝛽2 + 𝛽3 𝑅𝑖 and since we evaluate at 𝑅𝑖 = 0 we can
conclude that the treatment effect is 𝛽2 . The second option yields the same results, but with a
different regression specification (see slide 55 week 1).
Ex-post: treatment-by-covariate interaction: Panel data
Now let’s look at how interaction terms work
when working with panel data. In the first
case, we’ll look at a d-i-d approach with two
periods, two groups, and a binary
interaction variable. The first graph
visualises the results when we ran the
‘original’ d-i-d regression (see equation
below), so when we do not account for
heterogeneity, we’re thus just estimating 𝛼1 .
𝑌𝑖𝑡 = 𝛼0 +𝛼1 𝑇𝑟𝑒𝑎𝑡𝑖 × 𝑃𝑜𝑠𝑡𝑡 + 𝛼2 𝑇𝑟𝑒𝑎𝑡𝑖 + 𝛼3 𝑃𝑜𝑠𝑡𝑡 + 𝜀𝑖𝑡
The graph below visualises the results when we let 𝛼1 depend on the pre-treatment level of
the outcome. The regression equation when we include an interaction term (in this case the
variable ‘High’) is
𝑌𝑖𝑡 = 𝛼0 + 𝛼1 𝑇𝑟𝑒𝑎𝑡𝑖 × 𝑃𝑜𝑠𝑡𝑡 + 𝛼2 𝑇𝑟𝑒𝑎𝑡𝑖 + 𝛼3 𝑃𝑜𝑠𝑡𝑡 + 𝛼4 𝑇𝑟𝑒𝑎𝑡𝑖 × 𝑃𝑜𝑠𝑡𝑡 × 𝐻𝑖𝑔ℎ𝑖
+ 𝛼5 𝑇𝑟𝑒𝑎𝑡𝑖 × 𝐻𝑖𝑔ℎ𝑖 + 𝛼6 𝑃𝑜𝑠𝑡𝑡 × 𝐻𝑖𝑔ℎ𝑖 + 𝛼7 𝐻𝑖𝑔ℎ𝑖 + 𝜀𝑖𝑡
We’ll now look at the second case, which is a d-i-d approach with multiple periods, multiple
groups and a binary interaction variable. Originally the regression equation was
𝑌𝑖𝑡 = 𝛼0 +𝛼1 𝑇𝑖 × 𝑃𝑡 + 𝛿𝑖 + 𝛾𝑡 + 𝜀𝑖𝑡
Where the treatment dummy (𝑇𝑖 × 𝑃𝑡 ) is 1 for the treated groups during the treatment period.
The regression equation of the fully interacted model is
𝑌𝑖𝑡 = 𝛼0 +𝛼1 𝑇𝑖 × 𝑃𝑡 + 𝛼4 𝑇𝑖 × 𝑃𝑡 × 𝐻𝑖𝑔ℎ𝑖 + 𝛼6 𝑃𝑡 × 𝐻𝑖𝑔ℎ𝑖 + 𝛿𝑖 + 𝛾𝑡 + 𝜀𝑖𝑡
Again the differential treatment effect is estimated by 𝛼4 . Additionally 𝛼6 𝑃𝑡 × 𝐻𝑖𝑔ℎ𝑖 allows the
trend in the control group of the High group to be different from the trend in the control group
of the reference group.
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller UVTEC. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.49. You're not tied to anything after your purchase.