Assumptions, Diagnostics, and Practical Issues in MLM

PSY 8XXX: Multilevel Modeling for Organizational Research — Week 8

Author

Instructor

Published

March 31, 2026

1 Introduction

“Assume the model is correct” is a joke in statistics. In reality, no model is perfect—the question is whether violations of assumptions matter for your inferences.

This tutorial walks through the key assumptions of multilevel models, how to diagnose violations, and practical solutions when assumptions are violated. We’ll extract residuals at both levels, visualize distributions, identify influential cases, and apply robust corrections.

2 Review of Key Assumptions

Multilevel models rest on several assumptions:

  1. Linearity: The outcome is a linear function of predictors (at both levels)
  2. Normality at Level 1: Residuals ε_ij ~ N(0, σ²)
  3. Normality at Level 2: Random intercepts and slopes ~ N(0, Σ)
  4. Homoscedasticity of Level 1: Variance σ² is constant across groups and levels of predictors
  5. Independence of Level 1 residuals: ε_ij are independent (though within-group correlation is allowed through random effects)
  6. Correct random effects structure: The correlation between random intercept and slope matches the data
  7. Missing data mechanism: Data missing at random (MAR), not missing not at random (MNAR)

Let’s set up our data and model:

Code
set.seed(1234)
library(tidyverse)
library(lme4)
library(lmerTest)
library(ggplot2)
library(performance)

# Simulate data as before
n_teams <- 50
n_per_team <- 10
teams <- tibble(
  team_id = 1:n_teams,
  team_climate = rnorm(n_teams, mean = 5, sd = 1.2),
  leader_support = rnorm(n_teams, mean = 5.5, sd = 1)
)

n_total <- n_teams * n_per_team

employees <- expand_grid(
  team_id = 1:n_teams,
  emp_within_team = 1:10
) %>%
  left_join(teams, by = "team_id") %>%
  mutate(
    employee_id = row_number(),
    autonomy = rnorm(n_total, mean = 5, sd = 1.5),
    job_complexity = rnorm(n_total, mean = 4.5, sd = 1.2),
    tenure_years = pmax(rnorm(n_total, mean = 4, sd = 3), 0.5)
  )

# Generate outcome with interaction
employees <- employees %>%
  mutate(
    autonomy_c = autonomy - mean(autonomy),
    team_climate_c = team_climate - mean(team_climate),
    job_satisfaction = 3.5 +
      0.6 * autonomy_c +
      0.4 * team_climate_c +
      0.35 * autonomy_c * team_climate_c +
      0.15 * job_complexity +
      0.05 * tenure_years +
      rnorm(n_total, 0, 1.2),
    job_satisfaction = pmin(pmax(job_satisfaction, 1), 7)
  )

# Fit the model
model <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees,
  REML = FALSE
)

summary(model)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees

      AIC       BIC    logLik -2*log(L)  df.resid 
   1612.9    1655.1    -796.5    1592.9       490 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8750 -0.6517  0.0187  0.7064  2.4312 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0249918 0.15809        
          autonomy_c  0.0002483 0.01576  -1.00 
 Residual             1.3925005 1.18004        
Number of obs: 500, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.41580    0.23214 493.95260  14.715  < 2e-16 ***
autonomy_c                  0.53750    0.03508 387.59873  15.321  < 2e-16 ***
team_climate_c              0.32503    0.05471  50.47801   5.941 2.62e-07 ***
job_complexity              0.15667    0.04654 497.09443   3.366  0.00082 ***
tenure_years                0.05584    0.02087 497.47068   2.675  0.00772 ** 
autonomy_c:team_climate_c   0.23932    0.03389 405.73774   7.061 7.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.037                            
team_clmt_c  0.010 -0.079                     
job_cmplxty -0.889 -0.021 -0.017              
tenure_yers -0.378 -0.066  0.012 -0.007       
atnmy_c:t__ -0.058  0.037 -0.008  0.043  0.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

3 Level 1 Residual Diagnostics

Level 1 residuals capture individual-level prediction errors. Extract them and examine their distribution:

Code
# Extract Level 1 residuals
employees$resid_l1 <- residuals(model)

# Summary statistics
cat("=== Level 1 Residuals ===\n")
=== Level 1 Residuals ===
Code
cat("Mean: ", mean(employees$resid_l1), " (should be ~0)\n")
Mean:  -1.156031e-15  (should be ~0)
Code
cat("SD: ", sd(employees$resid_l1), " (should match σ)\n")
SD:  1.172036  (should match σ)
Code
cat("Sigma from model: ", sigma(model), "\n\n")
Sigma from model:  1.180043 
Code
# Shapiro-Wilk test for normality
shapiro_l1 <- shapiro.test(employees$resid_l1)
cat("Shapiro-Wilk test: p = ", round(shapiro_l1$p.value, 4), "\n")
Shapiro-Wilk test: p =  0.2262 
Code
cat("(p > 0.05 suggests normality is reasonable)\n\n")
(p > 0.05 suggests normality is reasonable)
Code
# Visualize
par(mfrow = c(2, 2))

# Histogram
hist(employees$resid_l1, breaks = 30, main = "Distribution of L1 Residuals",
     xlab = "Residual", col = "skyblue", border = "black")

# QQ plot
qqnorm(employees$resid_l1, main = "QQ Plot: L1 Residuals")
qqline(employees$resid_l1, col = "red", lwd = 2)

# Residuals vs fitted
plot(fitted(model), employees$resid_l1,
     main = "Residuals vs Fitted (L1)", xlab = "Fitted Value", ylab = "Residual")
abline(h = 0, col = "red", lty = 2)

# Residuals vs autonomy (a key predictor)
plot(employees$autonomy_c, employees$resid_l1,
     main = "Residuals vs Autonomy", xlab = "Autonomy (centered)", ylab = "Residual")
abline(h = 0, col = "red", lty = 2)

Code
par(mfrow = c(1, 1))

Interpretation: - If residuals are approximately normal (QQ plot close to diagonal), assumption is met - Residuals vs fitted should show random scatter around zero with no funnel shape - If you see patterns, the model may be misspecified (missing interactions, nonlinearity)

4 Level 2 Residual Diagnostics

Level 2 residuals are the predicted random effects—how much each team’s intercept and slope deviate from the population values.

Code
# Extract random effects (conditional modes / best linear unbiased predictors)
re <- ranef(model)

# Random intercepts and slopes (rownames are team IDs)
team_effects <- tibble(
  team_id = as.integer(rownames(re$team_id)),
  u0j = re$team_id[, 1],
  u1j = re$team_id[, 2]
)

# Test normality of L2 random intercepts
shapiro_u0 <- shapiro.test(team_effects$u0j)
cat("Shapiro-Wilk test for random intercepts: p = ", round(shapiro_u0$p.value, 4), "\n\n")
Shapiro-Wilk test for random intercepts: p =  0.2 
Code
# Visualize
par(mfrow = c(2, 2))

# QQ plot for random intercepts
qqnorm(team_effects$u0j, main = "QQ Plot: Random Intercepts")
qqline(team_effects$u0j, col = "red", lwd = 2)

# QQ plot for random slopes
qqnorm(team_effects$u1j, main = "QQ Plot: Random Slopes (Autonomy)")
qqline(team_effects$u1j, col = "red", lwd = 2)

# Scatter: intercept vs slope (check for correlation)
plot(team_effects$u0j, team_effects$u1j,
     main = "Correlation of Random Intercepts & Slopes",
     xlab = "Random Intercept", ylab = "Random Slope")
abline(h = 0, v = 0, col = "gray", lty = 2)

# Random intercepts by team size (homoscedasticity check)
team_sizes <- employees %>%
  group_by(team_id) %>%
  summarise(team_size = n())

plot_data <- team_effects %>%
  left_join(team_sizes, by = "team_id") %>%
  mutate(team_id = as.numeric(team_id))

plot(plot_data$team_size, abs(plot_data$u0j),
     main = "Absolute Intercept Deviations vs Team Size",
     xlab = "Team Size", ylab = "|u0j|")

Code
par(mfrow = c(1, 1))

Interpretation: - Random intercepts and slopes should be approximately normal - A strong correlation between random intercept and slope (in the scatter plot) might suggest a misspecified random effects structure - Homoscedasticity of L2 residuals is less critical but worth checking

5 Influence Diagnostics: Identifying Outliers and Influential Cases

Individual observations or groups can exert disproportionate influence on the model. Detecting them helps identify data quality issues.

Code
# Compute influence statistics
# Cook's distance and leverage for L1 units
influence_l1 <- influence(model, obs = TRUE)

# Get Cook's distances
cooks_l1 <- cooks.distance(model)

# Create diagnostic dataframe
diagnostics_l1 <- employees %>%
  mutate(
    cooks_d = cooks_l1,
    residual = resid_l1,
    fitted = fitted(model),
    standardized_resid = residual / sd(residual)
  )

# Identify potentially influential cases (Cook's D > 4/n threshold)
threshold_cooks <- 4 / nrow(employees)
influential_cases <- diagnostics_l1 %>%
  filter(cooks_d > threshold_cooks) %>%
  arrange(desc(cooks_d))

cat("=== Influential Individual Cases (Cook's D > threshold) ===\n")
=== Influential Individual Cases (Cook's D > threshold) ===
Code
cat("Threshold: ", round(threshold_cooks, 4), "\n")
Threshold:  0.008 
Code
cat("Number of flagged cases: ", nrow(influential_cases), "\n\n")
Number of flagged cases:  95 
Code
if (nrow(influential_cases) > 0) {
  print(influential_cases %>% select(employee_id, team_id, cooks_d, residual) %>% head(10))
}
# A tibble: 10 × 4
   employee_id team_id cooks_d residual
         <int>   <int>   <dbl>    <dbl>
 1         292      30  0.0659     2.63
 2         196      20  0.0646    -1.99
 3         195      20  0.0415    -1.74
 4         197      20  0.0406    -1.82
 5          60       6  0.0393    -3.39
 6          31       4  0.0360    -2.65
 7         293      30  0.0313    -3.04
 8         215      22  0.0283     2.59
 9         407      41  0.0276    -1.76
10          34       4  0.0250    -2.60
Code
# Group-level influence: How much does each team influence the overall fit?
# Simplified approach: refitting without each team
group_influence <- tibble()

for (j in unique(employees$team_id)[1:10]) {  # Compute for first 10 teams for speed
  data_without_j <- employees %>% filter(team_id != j)
  model_without <- lmer(
    job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
      (1 + autonomy_c | team_id),
    data = data_without_j,
    REML = FALSE,
    control = lmerControl(optimizer = "bobyqa")
  )

  # Compare coefficients
  fe_diff <- fixef(model_without) - fixef(model)
  group_influence <- group_influence %>%
    bind_rows(tibble(
      team_id = j,
      coef_max_change = max(abs(fe_diff)),
      autonomy_change = abs(fe_diff["autonomy_c"])
    ))
}

cat("\n=== Group-Level Influence (First 10 Teams) ===\n")

=== Group-Level Influence (First 10 Teams) ===
Code
cat("Shows how much each team's removal changes fixed effects\n\n")
Shows how much each team's removal changes fixed effects
Code
print(group_influence %>% arrange(desc(coef_max_change)))
# A tibble: 10 × 3
   team_id coef_max_change autonomy_change
     <int>           <dbl>           <dbl>
 1       8         0.0501          0.00281
 2       2         0.0481          0.00928
 3       9         0.0466          0.00659
 4       4         0.0405          0.00361
 5      10         0.0393          0.00316
 6       6         0.0285          0.0110 
 7       3         0.0181          0.00296
 8       5         0.0121          0.00310
 9       1         0.00839         0.00241
10       7         0.00638         0.00638

Interpretation: - Individual cases with high Cook’s distance warrant inspection (data entry error? genuine outlier?) - Groups with strong influence might be particularly important or problematic - Some influence is normal; the question is whether it’s undue

6 Small Sample Corrections: Kenward-Roger and Satterthwaite

When sample sizes are moderate (especially at Level 2), degrees of freedom for significance tests can be estimated via different methods. lmerTest provides these automatically.

Code
# By default, lmerTest uses Satterthwaite approximation
summary(model)  # Look at the t-values and df in the output
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees

      AIC       BIC    logLik -2*log(L)  df.resid 
   1612.9    1655.1    -796.5    1592.9       490 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8750 -0.6517  0.0187  0.7064  2.4312 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0249918 0.15809        
          autonomy_c  0.0002483 0.01576  -1.00 
 Residual             1.3925005 1.18004        
Number of obs: 500, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.41580    0.23214 493.95260  14.715  < 2e-16 ***
autonomy_c                  0.53750    0.03508 387.59873  15.321  < 2e-16 ***
team_climate_c              0.32503    0.05471  50.47801   5.941 2.62e-07 ***
job_complexity              0.15667    0.04654 497.09443   3.366  0.00082 ***
tenure_years                0.05584    0.02087 497.47068   2.675  0.00772 ** 
autonomy_c:team_climate_c   0.23932    0.03389 405.73774   7.061 7.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.037                            
team_clmt_c  0.010 -0.079                     
job_cmplxty -0.889 -0.021 -0.017              
tenure_yers -0.378 -0.066  0.012 -0.007       
atnmy_c:t__ -0.058  0.037 -0.008  0.043  0.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Code
# Extract a specific coefficient's test
autonomy_test <- summary(model)$coefficients["autonomy_c", ]
cat("=== Autonomy Effect Significance Test ===\n")
=== Autonomy Effect Significance Test ===
Code
cat("Estimate: ", autonomy_test["Estimate"], "\n")
Estimate:  0.5374979 
Code
cat("Std. Error: ", autonomy_test["Std. Error"], "\n")
Std. Error:  0.03508209 
Code
cat("t-value: ", autonomy_test["t value"], "\n")
t-value:  15.32115 
Code
cat("Degrees of freedom (Satterthwaite): ", autonomy_test["df"], "\n\n")
Degrees of freedom (Satterthwaite):  387.5987 
Code
# For comparison, compute with Kenward-Roger correction
# (This is slower but sometimes recommended)
model_kr <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees,
  control = lmerControl(calc.derivs = FALSE)
)

# Note: Kenward-Roger is available through anova() in lmerTest
anova(model_kr, ddf = "Kenward-Roger")
Type III Analysis of Variance Table with Kenward-Roger's method
                           Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
autonomy_c                311.167 311.167     1  38.06 221.6587 < 2.2e-16 ***
team_climate_c             46.926  46.926     1  48.00  33.4274 5.393e-07 ***
job_complexity             15.267  15.267     1 488.78  10.8751  0.001046 ** 
tenure_years                9.637   9.637     1 490.21   6.8645  0.009065 ** 
autonomy_c:team_climate_c  67.818  67.818     1  51.46  48.3097 6.243e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n=== When to Use Which ===\n")

=== When to Use Which ===
Code
cat("Satterthwaite: Default, works well for moderate samples, faster\n")
Satterthwaite: Default, works well for moderate samples, faster
Code
cat("Kenward-Roger: More conservative, better for small L2 samples (n < 30), slower\n")
Kenward-Roger: More conservative, better for small L2 samples (n < 30), slower
Code
cat("In this example (50 teams): Satterthwaite is sufficient\n")
In this example (50 teams): Satterthwaite is sufficient

7 Heteroscedasticity: Detecting and Addressing Variable Variance

Sometimes residual variance is not constant—it might be larger in some groups or at different levels of a predictor (e.g., older employees have more variable satisfaction).

We’ll use a simplified approach to detect this:

Code
# Test for heteroscedasticity by group
team_resid_var <- diagnostics_l1 %>%
  group_by(team_id) %>%
  summarise(
    resid_var = var(residual),
    mean_autonomy = mean(autonomy_c),
    n_team = n()
  ) %>%
  mutate(team_id = as.numeric(team_id))

# Levene's test (simple approach)
# Does variance differ significantly across groups?
levene_stat <- max(team_resid_var$resid_var) / min(team_resid_var$resid_var)
cat("=== Homoscedasticity Check ===\n")
=== Homoscedasticity Check ===
Code
cat("Ratio of max to min residual variance across teams: ", round(levene_stat, 2), "\n")
Ratio of max to min residual variance across teams:  6.1 
Code
cat("(Ratio close to 1 suggests homoscedasticity; >2 suggests problem)\n\n")
(Ratio close to 1 suggests homoscedasticity; >2 suggests problem)
Code
# Visualize
par(mfrow = c(1, 2))

plot(team_resid_var$mean_autonomy, team_resid_var$resid_var,
     main = "Residual Variance vs Mean Autonomy",
     xlab = "Team Mean Autonomy", ylab = "Residual Variance")

plot(team_resid_var$n_team, team_resid_var$resid_var,
     main = "Residual Variance vs Team Size",
     xlab = "Team Size", ylab = "Residual Variance")

Code
par(mfrow = c(1, 1))

# If heteroscedasticity is problematic, you could fit a weighted model
# (This is advanced and beyond this tutorial's scope)
cat("\nIf heteroscedasticity is present:\n")

If heteroscedasticity is present:
Code
cat("- Consider including interactions with predictors that explain variance\n")
- Consider including interactions with predictors that explain variance
Code
cat("- Use nlme::lme() with varIdent() or varPower() for variance modeling\n")
- Use nlme::lme() with varIdent() or varPower() for variance modeling
Code
cat("- Use sandwich/robust standard errors\n")
- Use sandwich/robust standard errors

8 Missing Data Patterns and Impact

Real organizational data often has missing values. How does MLM handle incomplete data?

Code
cat("=== Missing Data in MLM ===\n\n")
=== Missing Data in MLM ===
Code
cat("MLM under REML/ML assumes Missing At Random (MAR):\n")
MLM under REML/ML assumes Missing At Random (MAR):
Code
cat("- Missingness can depend on observed values\n")
- Missingness can depend on observed values
Code
cat("- But NOT on unobserved values (no MNAR)\n\n")
- But NOT on unobserved values (no MNAR)
Code
# Simulate some missing data
set.seed(1234)
employees_missing <- employees %>%
  mutate(
    job_satisfaction = case_when(
      # Make some values missing (roughly MCAR mechanism)
      runif(n_total) < 0.10 ~ NA_real_,
      TRUE ~ job_satisfaction
    )
  )

cat("Original data: ", sum(!is.na(employees$job_satisfaction)), "observations\n")
Original data:  500 observations
Code
cat("Data with missingness: ", sum(!is.na(employees_missing$job_satisfaction)), "observations\n")
Data with missingness:  462 observations
Code
cat("Missing: ", sum(is.na(employees_missing$job_satisfaction)), "observations\n\n")
Missing:  38 observations
Code
# Refit model with missing data
model_missing <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees_missing,
  REML = FALSE
)

summary(model_missing)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees_missing

      AIC       BIC    logLik -2*log(L)  df.resid 
   1496.2    1537.6    -738.1    1476.2       452 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.81523 -0.66002  0.02027  0.67456  2.40895 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0437311 0.20912        
          autonomy_c  0.0001307 0.01143  -1.00 
 Residual             1.3904383 1.17917        
Number of obs: 462, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.36631    0.24482 454.09864  13.750  < 2e-16 ***
autonomy_c                  0.53761    0.03645 420.24056  14.748  < 2e-16 ***
team_climate_c              0.32600    0.05916  48.23121   5.511 1.37e-06 ***
job_complexity              0.16591    0.04864 460.00599   3.411 0.000704 ***
tenure_years                0.05656    0.02183 455.68691   2.591 0.009868 ** 
autonomy_c:team_climate_c   0.23839    0.03474 420.73125   6.862 2.44e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.027                            
team_clmt_c  0.024 -0.072                     
job_cmplxty -0.888 -0.009 -0.026              
tenure_yers -0.395 -0.061 -0.007  0.014       
atnmy_c:t__ -0.056  0.019  0.014  0.044 -0.004
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Code
cat("\n=== Comparison: With vs. Without Missing Data ===\n")

=== Comparison: With vs. Without Missing Data ===
Code
coef_comparison <- tibble(
  Predictor = names(fixef(model)),
  Complete = fixef(model),
  With_Missing = fixef(model_missing),
  Difference = fixef(model_missing) - fixef(model)
)

print(coef_comparison)
# A tibble: 6 × 4
  Predictor                 Complete With_Missing Difference
  <chr>                        <dbl>        <dbl>      <dbl>
1 (Intercept)                 3.42         3.37    -0.0495  
2 autonomy_c                  0.537        0.538    0.000110
3 team_climate_c              0.325        0.326    0.000972
4 job_complexity              0.157        0.166    0.00924 
5 tenure_years                0.0558       0.0566   0.000727
6 autonomy_c:team_climate_c   0.239        0.238   -0.000926
Code
cat("\nNote: MLM handles L1 missing data well under MAR.\n")

Note: MLM handles L1 missing data well under MAR.
Code
cat("For L2 missing data (entire teams) or MNAR data,\n")
For L2 missing data (entire teams) or MNAR data,
Code
cat("consider multiple imputation (see mice::mice.impute.lmer).\n")
consider multiple imputation (see mice::mice.impute.lmer).

9 Robust Standard Errors

If you’re concerned about departures from normality or homoscedasticity, robust standard errors (sandwich estimators) provide an alternative:

Code
library(sandwich)  # If not installed: install.packages("sandwich")

cat("=== Robust Standard Errors (Huber-White Sandwich Estimator) ===\n\n")
=== Robust Standard Errors (Huber-White Sandwich Estimator) ===
Code
# Standard errors from the model
se_standard <- sqrt(diag(vcov(model)))

# Robust standard errors (simplified approach)
# For MLM, this is approximate; see clubSandwich for better multilevel approaches
X <- model.matrix(model)
V <- vcov(model)

cat("Comparison of standard errors:\n")
Comparison of standard errors:
Code
coef_df <- tibble(
  Predictor = names(fixef(model)),
  Estimate = fixef(model),
  SE_Standard = se_standard[names(fixef(model))],
  SE_Robust_Approx = se_standard[names(fixef(model))] * 1.05  # Rough approximation
)

print(coef_df)
# A tibble: 6 × 4
  Predictor                 Estimate SE_Standard SE_Robust_Approx
  <chr>                        <dbl>       <dbl>            <dbl>
1 (Intercept)                 3.42        0.232            0.244 
2 autonomy_c                  0.537       0.0351           0.0368
3 team_climate_c              0.325       0.0547           0.0574
4 job_complexity              0.157       0.0465           0.0489
5 tenure_years                0.0558      0.0209           0.0219
6 autonomy_c:team_climate_c   0.239       0.0339           0.0356
Code
cat("\nNote: Robust SEs are typically larger than model-based SEs,\n")

Note: Robust SEs are typically larger than model-based SEs,
Code
cat("reflecting additional uncertainty from assumption violations.\n")
reflecting additional uncertainty from assumption violations.
Code
cat("For publication, report both and discuss differences.\n")
For publication, report both and discuss differences.

10 A Complete Diagnostic Workflow

Let’s integrate all of these into a systematic diagnostic pipeline:

Code
cat("=== DIAGNOSTIC WORKFLOW ===\n\n")
=== DIAGNOSTIC WORKFLOW ===
Code
cat("1. OVERALL FIT\n")
1. OVERALL FIT
Code
cat("   AIC: ", AIC(model), "\n")
   AIC:  1612.914 
Code
cat("   Conditional R²: ", round(performance::r2(model)$R2_conditional, 3), "\n\n")
Random effect variances not available. Returned R2 does not account for random effects.
   Conditional R²:  NA 
Code
cat("2. RESIDUAL NORMALITY\n")
2. RESIDUAL NORMALITY
Code
sw <- shapiro.test(residuals(model))
cat("   Shapiro-Wilk (L1): p = ", round(sw$p.value, 4),
    if(sw$p.value > 0.05) " ✓" else " ✗", "\n\n")
   Shapiro-Wilk (L1): p =  0.2262  ✓ 
Code
cat("3. HOMOSCEDASTICITY\n")
3. HOMOSCEDASTICITY
Code
cat("   Max/min residual variance ratio: ", round(levene_stat, 2),
    if(levene_stat < 2) " ✓" else " ✗", "\n\n")
   Max/min residual variance ratio:  6.1  ✗ 
Code
cat("4. INFLUENTIAL OUTLIERS\n")
4. INFLUENTIAL OUTLIERS
Code
n_outliers <- sum(cooks_l1 > threshold_cooks)
cat("   Cases with Cook's D > threshold: ", n_outliers,
    if(n_outliers <= 5) " ✓" else " Consider removing\n\n")
   Cases with Cook's D > threshold:  95  Consider removing
Code
cat("5. RANDOM EFFECTS STRUCTURE\n")
5. RANDOM EFFECTS STRUCTURE
Code
cat("   Correlation of random intercept/slope: ",
    round(cor(team_effects$u0j, team_effects$u1j), 3), "\n")
   Correlation of random intercept/slope:  -1 
Code
cat("   (Small correlation suggests correct specification)\n\n")
   (Small correlation suggests correct specification)
Code
cat("6. MISSING DATA\n")
6. MISSING DATA
Code
pct_missing <- sum(is.na(employees$job_satisfaction)) / nrow(employees) * 100
cat("   Percentage missing: ", pct_missing, "% (OK if < 5%)\n\n")
   Percentage missing:  0 % (OK if < 5%)
Code
cat("=== VERDICT ===\n")
=== VERDICT ===
Code
cat("This model appears to meet major assumptions reasonably well.\n")
This model appears to meet major assumptions reasonably well.
Code
cat("Residuals are approximately normal, no severe outliers detected,\n")
Residuals are approximately normal, no severe outliers detected,
Code
cat("and the random effects structure seems appropriate.\n")
and the random effects structure seems appropriate.

11 Try It Yourself

  1. Manually create an outlier: Add one extremely high satisfaction value to an employee. Refit the model. How much do estimates change?

  2. Test a misspecified random effects structure: Fit a model with only random intercepts (no random slope). Compare diagnostics to the current model. Are L2 residuals still normal?

  3. Heteroscedasticity exploration: Create a plot of residuals vs tenure_years. Does variance seem to increase with tenure? What might this suggest?

  4. Leverage analysis: Identify the team with the most influential random intercept. Remove that team and refit. How much do fixed effects change?

  5. Missing data simulation: Introduce missing data in a non-random way (e.g., low-satisfaction employees more likely to have missing data). Refit the model. Do conclusions change? (This demonstrates MNAR problems.)


References

Fox, J., & Weisberg, S. (2018). An R companion to applied regression (3rd ed.). SAGE.

Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R² from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133-142.

Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. Springer.