Clustered data: a group of observation “belong to each other”,
- The reason why it is important is that data point in a cluster “share” information.
- Less information added with an additional clustered data point. In comparison with an
independent data point
- Effective sample size less when clustering is present.
Example 1
Clustering: household survey, all adults’ residents, presence of hypertension, all districts in
Indonesia
Clustering on 3 levels: within a household there is a clustering of participants, but those
household are also clustered within a district, since districts also have different exposures, but
also different exposures between households and individuals. So individual level, level of
household and district level.
Example 2
Clustering: testing eyesight, 1000 primary school children, different classroom teaching
methods
Clustering on 2 levels: clustering on individual levels and different classroom teaching
methods
Example 3
Lab-experiment, weekly weight gain in rats, comparing diets.
Clustering on 2 levels: clustering on individual level on two different times and clustered
within diets
, 1. How far away is the mean individual level from the average district? (epsilon and
indicates Ei the error of the individual observation)
2. How far away is the mean value of 1 district level from the average province value?
3. Resulting in Y (the outcome of the individual observation) can be described as the
distance between the individual observation and the district mean (Ekji) + the value
between the district mean that is removed from the province mean (ukj) + how far the
province mean is removed from the total mean (uk).
• Explain and apply basic multi-level data analysis
How can we deal with clustering and deal with a categorical variable with many groups?
1. Adjust directly into the regression.
- Adjust for gender, SES, occupation, etc.
2. Multilevel analysis (is mixed models) (MLM)
- Hierarchical linear model
- Random effects model
- Random coefficient analysis
3. Generalized estimating equations (GEE)
AN APPROACH ON HOW TO DEAL WITH CLUSTERING DATA is via MLM
It is an extension from a regression analysis. Why do we do MLM?
- Well for example area is a categorical variable and, in a regression analyses categorical
variables are represented by dummy variables. But having many dummy variables leads to
a loss of power and efficiency.
- It is used when we must adjust for clustered data: sport clubs, neighborhoods, areas,
medical doctors, families etc.
Three steps method for MLM:
- Estimate intercept for each level of variable. 12 different occupations are 12 intercepts, or
3 different social-economic status are three different intercepts, it is done behind the
scenes using dummy variables.
- Create normal distribution over all the intercepts. All those intercepts are assumed to be
normally distributed.
, - Estimate the variance of the distribution of intercepts. Are you going to add 69 dummy
variables for all different districts? No, you are going to summarize these intercepts by
summarizing the variance. You add one thing the variance instead of 69 dummies. We get
the variance by taking the difference between B0j and Bo divided by n-1. For slopes we
do the same but then estimate for each slope.
How to interpret MLM
- Includes variation between clusters in the model, we are adding “Random effects”.
- Random effects often not of direct interest. Just a way for proper adjustment clustering
- Focus on the “fixed effects”, those are Regression coefficients.
So, in MLM (subjects are nested within areas) you do a
- correction for area you do it by estimating the variance of the intercepts, variance of the
intercepts is called random intercept.
- Effect modification with area is carried out by estimating the variance of slopes, variance
of the slopes is called random slopes.
Resulting in methodological formula for clustering and having a good prediction of the
outcome
• Interpret output from multi-level analyses OBJECTIVES
Above fixed effects and random effects shown underneath:
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper July2. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.