Clustered data: a group of observation “belong to each other”,
- The reason why it is important is that data point in a cluster “share” information.
- Less information added with an additional clustered data point. In comparison with an
independent data point
- Effective sample size less when clustering is present.
Example 1
Clustering: household survey, all adults’ residents, presence of hypertension, all districts in
Indonesia
Clustering on 3 levels: within a household there is a clustering of participants, but those
household are also clustered within a district, since districts also have different exposures, but
also different exposures between households and individuals. So individual level, level of
household and district level.
Example 2
Clustering: testing eyesight, 1000 primary school children, different classroom teaching
methods
Clustering on 2 levels: clustering on individual levels and different classroom teaching
methods
Example 3
Lab-experiment, weekly weight gain in rats, comparing diets.
Clustering on 2 levels: clustering on individual level on two different times and clustered
within diets
, 1. How far away is the mean individual level from the average district? (epsilon and
indicates Ei the error of the individual observation)
2. How far away is the mean value of 1 district level from the average province value?
3. Resulting in Y (the outcome of the individual observation) can be described as the
distance between the individual observation and the district mean (Ekji) + the value
between the district mean that is removed from the province mean (ukj) + how far the
province mean is removed from the total mean (uk).
• Explain and apply basic multi-level data analysis
How can we deal with clustering and deal with a categorical variable with many groups?
1. Adjust directly into the regression.
- Adjust for gender, SES, occupation, etc.
2. Multilevel analysis (is mixed models) (MLM)
- Hierarchical linear model
- Random effects model
- Random coefficient analysis
3. Generalized estimating equations (GEE)
AN APPROACH ON HOW TO DEAL WITH CLUSTERING DATA is via MLM
It is an extension from a regression analysis. Why do we do MLM?
- Well for example area is a categorical variable and, in a regression analyses categorical
variables are represented by dummy variables. But having many dummy variables leads to
a loss of power and efficiency.
- It is used when we must adjust for clustered data: sport clubs, neighborhoods, areas,
medical doctors, families etc.
Three steps method for MLM:
- Estimate intercept for each level of variable. 12 different occupations are 12 intercepts, or
3 different social-economic status are three different intercepts, it is done behind the
scenes using dummy variables.
- Create normal distribution over all the intercepts. All those intercepts are assumed to be
normally distributed.
, - Estimate the variance of the distribution of intercepts. Are you going to add 69 dummy
variables for all different districts? No, you are going to summarize these intercepts by
summarizing the variance. You add one thing the variance instead of 69 dummies. We get
the variance by taking the difference between B0j and Bo divided by n-1. For slopes we
do the same but then estimate for each slope.
How to interpret MLM
- Includes variation between clusters in the model, we are adding “Random effects”.
- Random effects often not of direct interest. Just a way for proper adjustment clustering
- Focus on the “fixed effects”, those are Regression coefficients.
So, in MLM (subjects are nested within areas) you do a
- correction for area you do it by estimating the variance of the intercepts, variance of the
intercepts is called random intercept.
- Effect modification with area is carried out by estimating the variance of slopes, variance
of the slopes is called random slopes.
Resulting in methodological formula for clustering and having a good prediction of the
outcome
• Interpret output from multi-level analyses OBJECTIVES
Above fixed effects and random effects shown underneath:
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller July2. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.03. You're not tied to anything after your purchase.