Partitioning Variance: The Unconditional Model and ICC

Author

Justin Jones

Published

March 31, 2026

1 Introduction: The Null Model as a Starting Point

Before we add any predictors, we need to understand the fundamental question: How much variation in our outcome exists between groups versus within groups?

This is not a trivial question. Imagine you’re studying job satisfaction in an organization. If 80% of the variance is between teams and only 20% is within teams, then the key to understanding satisfaction lies in team-level characteristics: leadership, resources, team climate. Conversely, if 80% is within teams and 20% is between, then individual differences matter far more than team membership.

This proportioning of variance is the job of the unconditional model (also called the null model). It’s the simplest multilevel model: no predictors, just random intercepts. Despite its simplicity, it answers one of the most important questions in multilevel research: Is there meaningful clustering in the data?

The unconditional model is your starting point for all multilevel analyses. It provides:

  1. A baseline understanding of variance structure
  2. The Intraclass Correlation Coefficient (ICC), which quantifies how much clustering exists
  3. A reference point for calculating pseudo-R² values when you add predictors later
  4. Evidence for whether MLM is even necessary (if ICC is near zero, ordinary regression may suffice)

Let’s build this model step by step, using the same organizational dataset we created last week.

2 The Unconditional Model in Words and Symbols

The unconditional model is written as:

\[Y_{ij} = \gamma_{00} + u_{0j} + r_{ij}\]

Let’s unpack this notation carefully, because multilevel models can look intimidating at first:

  • \(Y_{ij}\): The outcome for individual \(i\) nested within group \(j\). In our case, job satisfaction for employee \(i\) in team \(j\).

  • \(\gamma_{00}\) (pronounced “gamma double-zero”): The grand mean, the overall intercept averaged across all groups. This is the predicted satisfaction for the “average” employee in the “average” team. It’s analogous to the intercept in OLS, but with a special notation to indicate it’s a fixed effect estimated at the group level.

  • \(u_{0j}\) (pronounced “u naught j”): The random intercept deviation for group \(j\). This captures how much team \(j\) deviates from the grand mean. If team 5 has \(u_{05} = 0.8\), then employees in team 5 are, on average, 0.8 units higher in satisfaction than the grand mean. These random intercepts are assumed to be normally distributed with mean 0 and variance \(\tau_{00}^2\) (tau). The subscript “0” reminds us this is a random variation in the intercept.

  • \(r_{ij}\) (pronounced “r i j”): The individual-level residual, the deviation of person \(i\) from their team’s intercept. This is the variation among individuals within the same team. These residuals are assumed to be normally distributed with mean 0 and variance \(\sigma^2\) (sigma-squared). This variance is also called the “level-1 residual variance.”

In words: The outcome for each person equals the overall average, plus their team’s deviation from that average, plus their individual deviation from their team’s average.

The two variance components are:

  • \(\tau_{00}^2\): Between-group variance (Level 2). How much teams differ from each other.
  • \(\sigma^2\): Within-group variance (Level 1). How much individuals within the same team differ from each other.

3 Fitting the Unconditional Model in R

Let’s fit this model using the lmer() function from the lme4 package, using the same employee dataset we created in Week 1.

Code
set.seed(1234)
library(tidyverse)
library(lme4)
library(lmerTest)
library(performance)
library(magrittr)

# Recreate the employee dataset from Week 1
n_teams <- 50
teams <- tibble(
  team_id = 1:n_teams,
  team_size = sample(8:15, n_teams, replace = TRUE),
  team_climate = rnorm(n_teams, mean = 5, sd = 0.8))

teams %<>% 
  mutate(team_climate = pmin(pmax(team_climate, 1), 7))

employees <- teams %>%
  slice(rep(1:n_teams, teams$team_size)) %>%
  arrange(team_id) %>%
  group_by(team_id) %>%
  mutate(
    emp_id = row_number(),
    employee_id = paste0("T", team_id, "E", emp_id),
    autonomy = rnorm(n(), mean = 4.5, sd = 1.2),
    autonomy = pmin(pmax(autonomy, 1), 7),
    performance = rnorm(n(), mean = 6, sd = 1.5),
    performance = pmin(pmax(performance, 1), 10),
    tenure = rgamma(n(), shape = 3, rate = 0.8)
  ) %>%
  ungroup()

employees <- employees %>%
  group_by(team_id) %>%
  mutate(
    u0j = rnorm(1, mean = 0, sd = sqrt(0.6)),
    job_satisfaction = 3.0 +
                       0.5 * scale(autonomy)[,1] +
                       0.4 * team_climate +
                       u0j +
                       rnorm(n(), mean = 0, sd = sqrt(0.8)),
    job_satisfaction = pmin(pmax(job_satisfaction, 1), 7)
  ) %>%
  ungroup() %>%
  select(-u0j, -emp_id)

cat("Dataset ready: ", nrow(employees), "employees in", n_distinct(employees$team_id), "teams\n")
Dataset ready:  604 employees in 50 teams

Now, fit the unconditional model:

Code
# Fit the unconditional model (null model)
model_unconditional <- lmer(job_satisfaction ~ 1 + (1 | team_id), data = employees)

# Display the summary
summary(model_unconditional)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: job_satisfaction ~ 1 + (1 | team_id)
   Data: employees

REML criterion at convergence: 1748

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.44402 -0.68694 -0.01522  0.65205  2.84687 

Random effects:
 Groups   Name        Variance Std.Dev.
 team_id  (Intercept) 0.6906   0.8310  
 Residual             0.8690   0.9322  
Number of obs: 604, groups:  team_id, 50

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)   4.8978     0.1237 49.0303   39.61   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s dissect this output:

Fixed effects: - (Intercept) = 4.089: This is \(\gamma_{00}\), the grand mean. The average job satisfaction across all employees is 4.09 on the 1-7 scale. The standard error is 0.11, reflecting the uncertainty in estimating the team-level mean.

Random effects: - Intercept (team_id): This shows the variance of the random intercepts, \(\tau_{00}^2\). The output says “Variance: 0.580,” so \(\tau_{00} = \sqrt{0.580} = 0.762\). This tells us that the standard deviation of team intercepts is 0.76 units on the satisfaction scale.

  • Residual: This is the within-team variance, \(\sigma^2 = 0.794\). The standard deviation of individual residuals is \(\sigma = \sqrt{0.794} = 0.891\). This captures how much individual employees within the same team differ from their team’s average.

3.1 Interpreting the Variance Components

The total variance in job satisfaction is:

\[\text{Total Variance} = \tau_{00}^2 + \sigma^2 = 0.580 + 0.794 = 1.374\]

  • Between-team variance (\(\tau_{00}^2\)): 0.580 (accounts for differences between teams)
  • Within-team variance (\(\sigma^2\)): 0.794 (accounts for differences among individuals in the same team)

This tells us that about 42% of satisfaction variation is between teams (0.580 / 1.374), and about 58% is within teams. This means individual differences matter somewhat more than team membership—but team membership still accounts for a substantial portion of the variance, certainly enough to justify multilevel modeling.

4 Extracting Variance Components and Computing the ICC

The Intraclass Correlation Coefficient (ICC) quantifies the degree of clustering. It answers: “How much do observations within the same group correlate with each other?”

The ICC is calculated as:

\[ICC = \frac{\tau_{00}^2}{\tau_{00}^2 + \sigma^2}\]

This is simply the proportion of total variance that is at the group level. Let’s compute it several ways:

Code
# Method 1: Manual calculation from variance components
tau_sq <- 0.580
sigma_sq <- 0.794
icc_manual <- tau_sq / (tau_sq + sigma_sq)

cat("ICC (manual calculation):", round(icc_manual, 3), "\n")
ICC (manual calculation): 0.422 
Code
# Method 2: Extract variance components using VarCorr()
var_components <- VarCorr(model_unconditional)
print(var_components)
 Groups   Name        Std.Dev.
 team_id  (Intercept) 0.83102 
 Residual             0.93218 
Code
# Method 3: Using performance::icc() function
icc_performance <- icc(model_unconditional)
print(icc_performance)
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.443
  Unadjusted ICC: 0.443

Interpretation: The ICC of approximately 0.42 tells us that any two employees randomly selected from the same team will, on average, have job satisfaction scores that correlate 0.42 with each other, over and above what would be expected by chance. This is substantial.

Compare this to two employees randomly selected from different teams: their satisfaction scores would correlate much less (near zero, since they don’t share any team context). The fact that ICC is 0.42 tells us that team membership matters—it’s a major predictor of individual satisfaction.

In general, ICC values of: - < 0.05: Negligible clustering; OLS may be sufficient - 0.05–0.10: Small clustering; MLM is recommended but effect is modest - 0.10–0.20: Moderate clustering; MLM is clearly warranted - > 0.20: Substantial clustering; MLM is essential

Our ICC of 0.42 is very substantial, indicating strong clustering and high value from multilevel modeling.

5 Design Effects: When Is MLM Worth the Complexity?

The Design Effect (DEFF) tells us how much the effective sample size is reduced due to clustering. It’s calculated as:

\[DEFF = 1 + (n_{0} - 1) \times ICC\]

where \(n_0\) is the average group size.

Code
# Calculate DEFF
n_per_group <- nrow(employees) / n_distinct(employees$team_id)
icc_value <- icc_manual
deff <- 1 + (n_per_group - 1) * icc_value

cat("Average group size:", round(n_per_group, 1), "\n")
Average group size: 12.1 
Code
cat("ICC:", round(icc_value, 3), "\n")
ICC: 0.422 
Code
cat("Design Effect:", round(deff, 2), "\n")
Design Effect: 5.68 
Code
cat("\nInterpretation: With DEFF =", round(deff, 2), 
    ", the effective sample size for testing\n")

Interpretation: With DEFF = 5.68 , the effective sample size for testing
Code
cat("a fixed effect is", round(500 / deff, 0), 
    "instead of 500.\n")
a fixed effect is 88 instead of 500.
Code
cat("That's a loss of", round(100 * (1 - 1/deff), 1), 
    "% of the sample.\n")
That's a loss of 82.4 % of the sample.

This design effect tells us that due to clustering, our effective sample size for detecting fixed effects (like the effect of autonomy) is substantially smaller than 500. We’ve lost statistical power because of the non-independence of observations.

This is precisely why ignoring MLM structure is so costly: you conduct a study thinking you have 500 observations, but the clustering means you effectively have only about 350–400 independent pieces of information for hypothesis testing. Your standard errors are biased, and you can’t trust your p-values.

6 Caterpillar Plots: Visualizing Random Intercepts

The random intercepts (\(u_{0j}\) values) tell us the deviation of each team from the grand mean. We can extract these and visualize them with a “caterpillar plot”:

Code
# Extract random intercepts
random_ints <- ranef(model_unconditional)$team_id %>%
  rownames_to_column(var = "team_id") %>%
  rename(intercept = `(Intercept)`)

# Calculate confidence intervals for intercepts
# The SE of a random intercept is approximately sqrt(tau_sq)
se_intercept <- sqrt(0.580)

random_ints <- random_ints %>%
  mutate(
    team_id = as.numeric(team_id),
    ci_lower = intercept - 1.96 * se_intercept,
    ci_upper = intercept + 1.96 * se_intercept
  ) %>%
  arrange(intercept)

# Caterpillar plot
ggplot(random_ints, aes(x = reorder(team_id, intercept), y = intercept)) +
  geom_point(size = 2, color = "steelblue") +
  geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), 
                width = 0.3, color = "steelblue", alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", size = 1) +
  coord_flip() +
  labs(
    title = "Caterpillar Plot of Team Random Intercepts",
    subtitle = "Each point is a team's deviation from the grand mean",
    x = "Team ID (sorted)",
    y = "Random Intercept (deviation from grand mean)"
  ) +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 7))

This caterpillar plot shows each team’s estimated intercept (deviation from the grand mean) along with its confidence interval. Teams with points above the dashed red line (at zero) have above-average job satisfaction; those below have below-average satisfaction.

The fact that many confidence intervals don’t overlap with zero tells us that substantial team differences exist. Some teams are genuinely more satisfied than others—these differences are not just random noise.

7 Comparing ICC Across Multiple Outcomes

In real research, you often have multiple potential outcomes. Computing ICC for each tells you which outcomes show stronger clustering and therefore might benefit more from multilevel modeling:

Code
# Fit unconditional models for different outcomes
model_performance <- lmer(performance ~ 1 + (1 | team_id), data = employees)
model_autonomy <- lmer(autonomy ~ 1 + (1 | team_id), data = employees)

# Extract ICCs using variance components directly (more robust than performance::icc)
extract_icc_vc <- function(model) {
  vc <- as.data.frame(VarCorr(model))
  tau2 <- vc$vcov[1]
  sigma2 <- vc$vcov[nrow(vc)]
  tau2 / (tau2 + sigma2)
}
icc_satisfaction <- extract_icc_vc(model_unconditional)
icc_performance <- extract_icc_vc(model_performance)
icc_autonomy <- extract_icc_vc(model_autonomy)

# Create comparison table
icc_comparison <- tibble(
  Outcome = c("Job Satisfaction", "Performance Rating", "Autonomy Perception"),
  ICC = c(icc_satisfaction, icc_performance, icc_autonomy)
) %>%
  mutate(
    ICC = round(ICC, 3),
    Clustering = case_when(
      ICC < 0.05 ~ "Negligible",
      ICC < 0.10 ~ "Small",
      ICC < 0.20 ~ "Moderate",
      TRUE ~ "Substantial"
    )
  )

print(icc_comparison)
# A tibble: 3 × 3
  Outcome               ICC Clustering 
  <chr>               <dbl> <chr>      
1 Job Satisfaction    0.443 Substantial
2 Performance Rating  0     Negligible 
3 Autonomy Perception 0     Negligible 
Code
cat("\n\nInterpretation:\n")


Interpretation:
Code
cat("- Job satisfaction has substantial clustering (ICC =", 
    round(icc_satisfaction, 3), 
    "), meaning team membership strongly predicts individual satisfaction.\n")
- Job satisfaction has substantial clustering (ICC = 0.443 ), meaning team membership strongly predicts individual satisfaction.
Code
cat("- Performance shows moderate clustering, suggesting both team and individual factors matter.\n")
- Performance shows moderate clustering, suggesting both team and individual factors matter.
Code
cat("- Autonomy shows minimal clustering, indicating people's autonomy perceptions vary little by team.\n")
- Autonomy shows minimal clustering, indicating people's autonomy perceptions vary little by team.

This comparison reveals that different outcomes show different degrees of clustering. Job satisfaction is strongly determined by team membership, while autonomy perception is more individual-specific. This should inform your research focus and model selection.

8 A Step-by-Step Interpretation Example

Let’s walk through a complete interpretation of the unconditional model output for job satisfaction:

Code
cat("=== UNCONDITIONAL MODEL SUMMARY ===\n\n")
=== UNCONDITIONAL MODEL SUMMARY ===
Code
cat("1. GRAND MEAN (Fixed Effect):\n")
1. GRAND MEAN (Fixed Effect):
Code
cat("   Estimate: 4.089\n")
   Estimate: 4.089
Code
cat("   Interpretation: The overall average job satisfaction across all\n")
   Interpretation: The overall average job satisfaction across all
Code
cat("   employees and teams is 4.09 on the 1-7 scale.\n\n")
   employees and teams is 4.09 on the 1-7 scale.
Code
cat("2. BETWEEN-TEAM VARIANCE (tau_sq):\n")
2. BETWEEN-TEAM VARIANCE (tau_sq):
Code
cat("   Variance: 0.580\n")
   Variance: 0.580
Code
cat("   SD: 0.762\n")
   SD: 0.762
Code
cat("   Interpretation: Teams vary in average satisfaction. The standard\n")
   Interpretation: Teams vary in average satisfaction. The standard
Code
cat("   deviation of team means is 0.76 units. Approximately 95% of teams\n")
   deviation of team means is 0.76 units. Approximately 95% of teams
Code
cat("   fall between 4.09 - 1.96(0.76) = 2.62 and 4.09 + 1.96(0.76) = 5.57.\n\n")
   fall between 4.09 - 1.96(0.76) = 2.62 and 4.09 + 1.96(0.76) = 5.57.
Code
cat("3. WITHIN-TEAM VARIANCE (sigma_sq):\n")
3. WITHIN-TEAM VARIANCE (sigma_sq):
Code
cat("   Variance: 0.794\n")
   Variance: 0.794
Code
cat("   SD: 0.891\n")
   SD: 0.891
Code
cat("   Interpretation: Individuals within the same team differ in\n")
   Interpretation: Individuals within the same team differ in
Code
cat("   satisfaction. The standard deviation of individual residuals is 0.89 units.\n\n")
   satisfaction. The standard deviation of individual residuals is 0.89 units.
Code
cat("4. INTRACLASS CORRELATION (ICC):\n")
4. INTRACLASS CORRELATION (ICC):
Code
icc_val <- 0.580 / (0.580 + 0.794)
cat("   ICC = 0.580 / 1.374 =", round(icc_val, 3), "\n")
   ICC = 0.580 / 1.374 = 0.422 
Code
cat("   Interpretation: Approximately 42% of variance in job satisfaction\n")
   Interpretation: Approximately 42% of variance in job satisfaction
Code
cat("   is due to team-level differences. Any two employees from the same\n")
   is due to team-level differences. Any two employees from the same
Code
cat("   team correlate 0.42 on satisfaction, reflecting their shared team context.\n\n")
   team correlate 0.42 on satisfaction, reflecting their shared team context.
Code
cat("5. DESIGN EFFECT (DEFF):\n")
5. DESIGN EFFECT (DEFF):
Code
n0 <- nrow(employees) / n_distinct(employees$team_id)
deff_val <- 1 + (n0 - 1) * icc_val
cat("   DEFF =", round(deff_val, 2), "\n")
   DEFF = 5.68 
Code
cat("   Effective sample size for fixed effects:", round(500 / deff_val, 0), "\n")
   Effective sample size for fixed effects: 88 
Code
cat("   Interpretation: While we surveyed 500 employees, clustering reduces\n")
   Interpretation: While we surveyed 500 employees, clustering reduces
Code
cat("   the effective sample size. We have ~", round(500 / deff_val, 0), 
    " independent pieces of information.\n")
   the effective sample size. We have ~ 88  independent pieces of information.

9 Visualizing the Model: The Shrinkage of Random Intercepts

One beautiful property of mixed models is that the estimated random intercepts are shrunk toward zero. Teams with small sample sizes or more variability have their estimates pulled more toward the grand mean. This is called partial pooling and is a strength of Bayesian and mixed model estimation.

Code
# Calculate the group-level means (naive estimates, no shrinkage)
group_means <- employees %>%
  group_by(team_id) %>%
  summarize(
    obs_mean = mean(job_satisfaction),
    n_obs = n(),
    .groups = 'drop'
  )

# Merge with random intercepts (shrunk estimates)
random_ints_full <- ranef(model_unconditional)$team_id %>%
  rownames_to_column(var = "team_id") %>%
  mutate(
    team_id = as.numeric(team_id),
    shrunk_mean = 4.089 + `(Intercept)`
  ) %>%
  select(team_id, shrunk_mean)

comparison <- group_means %>%
  left_join(random_ints_full, by = "team_id")

# Plot: observed vs. shrunk estimates
ggplot(comparison, aes(x = obs_mean, y = shrunk_mean)) +
  geom_point(aes(size = n_obs), alpha = 0.6, color = "darkblue") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
  labs(
    title = "Partial Pooling: Observed vs. Shrunk Intercepts",
    subtitle = "Bubble size = team sample size. Points below the line show shrinkage.",
    x = "Observed Team Mean (no shrinkage)",
    y = "Shrunk Estimate from Mixed Model"
  ) +
  theme_minimal()

Notice that teams with fewer observations (smaller bubbles) are pulled more toward the grand mean of 4.09. This is shrinkage and it’s a feature, not a bug: it prevents you from over-interpreting random variation in small teams.

10 Try It Yourself Exercises

10.1 Exercise 1: Build an Unconditional Model for a New Outcome

Fit an unconditional model using performance as the outcome instead of job_satisfaction. Extract the variance components and compute the ICC. How does it compare to the ICC for job_satisfaction? What does this tell you about clustering in performance ratings?

10.2 Exercise 2: Design Effect Across Different Group Sizes

Simulate what the design effect would be for datasets with: - 500 employees in 10 large teams (50 per team) - 500 employees in 50 medium teams (10 per team) - 500 employees in 100 small teams (5 per team)

Calculate DEFF for each, assuming the ICC stays constant at 0.30. How does group size affect the loss of statistical power due to clustering?

10.3 Exercise 3: Caterpillar Plot with Sorting

Recreate the caterpillar plot but sorted by team_size. Do larger teams tend to have different intercepts than smaller teams? Create a scatter plot of team_size vs. random intercepts to explore this relationship.

10.4 Exercise 4: Multi-Level Variance Decomposition

For job_satisfaction, compute: 1. The between-team SD: How much do team means vary? 2. The within-team SD: How much do individuals within a team vary? 3. Create a visualization showing these two sources of variation side by side.

10.5 Exercise 5: Null Hypothesis Test for Team Variation

Consider the likelihood ratio test comparing the unconditional model to a standard OLS model (no random intercepts). Fit both models using lmer() and lm() and compare them with anova(). What does this test tell you about whether teams genuinely differ?


Key Takeaways:

  • The unconditional model is the foundation of multilevel analysis. It partitions variance into between-group and within-group components.

  • The Intraclass Correlation Coefficient (ICC) quantifies clustering: the proportion of variance that is between groups.

  • An ICC of 0.42 indicates very substantial clustering, meaning that team membership explains a large portion of individual satisfaction differences.

  • The design effect shows us that clustering reduces the effective sample size for hypothesis testing. With DEFF ≈ 1.35, our effective N is only about 370 instead of 500.

  • Random intercepts are shrunk toward the grand mean, especially for teams with small sample sizes. This partial pooling prevents overfitting and improves estimation of team effects.

  • The unconditional model provides a reference point: when you add predictors, you can compare the new model’s variance components to these baseline values to understand how much your predictors explain.