Summary Causal Analysis in Data Science: Multilevel Analysis Module
82 views 0 purchase
Course
Causal Analysis in Data Science (424301M6)
Institution
Tilburg University (UVT)
Extensive summary of the module 'Multilevel Analysis' (Causal Analysis in Data Science course). It contains many notes from the video lectures. All video lectures are discussed in detail.
Tilburg University Multilevel Analysis Causal Analysis in Data Science
Causal Analysis in Data Science
Multilevel Analysis (Prof. Gelissen)
Introduction to the multilevel analysis module
Key terms from OLS regression (recap): make sure that you are familiar with these terms!
Intercept The intercept is defined as the expected value of Y (= dependent variable) when
X = 0 (= independent variable)
Slope The measure of the steepness of a line. Interpret the slope of a line as the
change in y when x changes by 1 unit
Residual The part of the total variation in the dependent variable that is not being
explained by the regression
Standard Deviation A statistic that measures the spreading of a dataset relative to its mean
Standard Error It is the standard deviation of the sampling distribution of a particular statistic
(e.g. the intercept or the regression coefficient). It tells us how precise the
estimate is. We want the standard error to be small.
Dummy variable A numeric variable that represents categorical / nominal data.
Covariate A covariate is a possible predictive or explanatory variable of the dependent
variable.
Why multilevel analysis?
Multilevel models are statistical models of parameters that vary at more than one level. Multi-level
modelling provides a useful framework for thinking about problems with a hierarchical structure. It is an
approach that can be used to handle clustered or grouped data. The people that enter the sample do not
come from a random sample. In these cases, multi-stage sampling is used ↓
Suppose you want to research employee performance in the Netherlands. You cannot include all
employees in the Netherlands in your sample. So, you should make a sampling frame of all organizations
in the country (i.e. make a list). Then we take a random sample from this list. Then again make a sampling
frame for the departments of these organizations. Then we can again take a random sample from these
departments. After this we make a sampling frame for the teams and again take a random sample. Then
we continue the sampling process to the individual level.
So, in the end you have multi-level analyzes (hierarchical nature). Thus, we can have theoretical concepts
that we can measure (in terms of variables) that relate to each of these levels. This means that we can
have different levels, and for each level we can define constructs that describe certain characteristics of
that level. We can make statements about employee performance at the organizational level (e.g. sector),
but also at the individual level (e.g. age or gender). The performance is in fact influenced at several levels.
The ecological fallacy: When studying a certain level, we must make sure that the conclusions we draw
from that analysis only hold for that specific level (i.e. things that apply to teams do not necessarily apply
to individuals within this team). For example, if a department is known for poor efficiency, it does not
mean that ALL employees in this department have poor efficiency.
1
,Tilburg University Multilevel Analysis Causal Analysis in Data Science
A hierarchical order is also present in longitudinal data: time series nested within individuals. If we
measure the blood pressure of a group of patients every week, we can see the repeated measurements as
grouped within the individual subjects. Consequently, we can describe constructs for individuals,
individuals over time and groups.
Assumption of independence of observations: This assumption means that the observations between
groups should be independent, which basically means the groups are made up of different people. One of
the main purposes of multilevel models is to deal with cases where the assumption of independence is
violated. You should pay close attention to dependencies between levels when analyzing and interpreting
data. For example, when the individuals you analyze have been selected based on which team they
belong to. Moreover, an outcome at an individual level can be explained by a factor at a group level. In
this case we need to perform multilevel analysis, and not standard OLS to draw accurate conclusions.
Coleman boat: Successful football coaches have more social prestige. Why are these things correlated?
According to the 'Coleman boat' theory we can make an argument such as: the more successful the
trainer is, the better he can motivate players, this leads to better performances of the players and
therefore of the team as a whole. And if teams perform well, this results in more social prestige for
trainers. Multilevel analyzes make it possible to analyze part of this reasoning.
Reasons to use multilevel modelling:
- If we have clustered data, the assumption of observation independence is violated. These
dependencies need to be considered in our statistical model.
- We can separate effects due to the varying composition of groups in terms of individual
characteristics (due to clustering of individuals within groups) from true contextual effects → =
disentangle compositions effect from contextual effects.
o Compositional effect: Observed between-group difference in the distribution of some
outcome that can be explained by differences in the distribution of covariates
o Individual characteristics can explain group level differences if the proportion of the
individual characteristics is not exactly the same for all groups
o Suppose you have 50 groups, but these groups differ in terms of the gender distribution.
This means that this difference in gender composition between groups may explain some
of the variance between groups in our outcome. = composition effect
o This different from a true contextual effect: When a group characteristic influences the
outcome at the individual level.
- OLS does not tell us how much variation there is at each level of analysis. Multilevel analyzes do
provide us with this kind of information.
- Variables refer to theoretically correct level of analysis.
- Disaggregation of macro-characteristics to individual characteristics (i.e. giving everyone the
same group score) inflates type 1 errors of macro effects ("false positives")
o Standard Errors will be biased (they will be to small) Estimates are less certain than
the SE suggests (as you do not account for some presumed similarity in the distribution
of observations within clusters)
o Thus, multilevel analyses provide us with unbiased standard errors
2
, Tilburg University Multilevel Analysis Causal Analysis in Data Science
The logic of the multilevel model
Multilevel analysis is also known as 'Hierarchical Linear Modeling'. It is called "hierarchical" because we
assume that there is a nested structure in our data. Lower level units are embedded in a certain hierarchy
of higher-level units.
- It is also the name of a software package (‘HLM’)
- Multilevel analysis is often used as a description for broader class of models, like Random
coefficient models and mixed models
- All these models have an essential aspect in common; they combine fixed effects and random
effects. Furthermore, these models have clustered / nested data, and the dependent variable is
at the lowest level, and the independent variables are defined at all levels.
Multilevel Analysis: If you want to know group differences, you can perform an OLS for each group
separately. This will result in a unique slope and intercept for each group. In MLA, you want to know …
- How large the variation between groups is in terms of the slope and intercept AND
- How this variation in slopes and intercepts can be explained
MLA models variance at (at least) two levels of analysis. Level 1 is the individual level, and level 2 is the
group level. What happens conceptually (two-stage approach to multilevel modeling):
- Step 1: within unit relationships for each unit
o Estimates separate regression equations within units
o This summarizes relationships within units (intercept and slopes)
o These parameters become dependent variables at step 2 / level 2
- Step 2: model variance in level-1 parameters (intercepts & slopes) with between unit 2 variables
o Use these “summaries” of the within unit relationships as outcome variables regressing
them on level-2 characteristics
▪ Variance in Intercepts predicted by between unit variables
▪ Variance in slopes predicted by between unit variables
o The different intercepts and slopes are considered variables with which we want to be
able to explain variance
Mathematically, not really a two-step process, but this helps in understanding what is going on.
Fixed versus random effects: important concepts in MLA!
- Fixed effects: Effects that do not vary across units; what would happen if everybody had the same
effect/slope (E.g. Level 2 intercept, Level 2 slope)
o You do not assume variation in parameters, kind of average effect (one estimate)
- Random effects: Coefficients/effects that are assumed to vary across units; the difference in a
group intercept/slope from the overall intercept/slope (Within unit intercepts; within unit slopes;
Level 2 residuals)
o These estimates (slopes and intercepts) have a distribution. When we are interested in
the variation within this distribution, we are interested in the random effects
o Residuals are very important in MLA (expressed by the variance in the slopes and
intercepts)
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller dc070498. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.42. You're not tied to anything after your purchase.