Assumptions, Diagnostics, and Practical Issues in MLM

PSY 8XXX: Multilevel Modeling for Organizational Research — Week 8

Author

Instructor

Published

March 31, 2026

1 Introduction

“Assume the model is correct” is a joke in statistics. In reality, no model is perfect—the question is whether violations of assumptions matter for your inferences.

This tutorial walks through the key assumptions of multilevel models, how to diagnose violations, and practical solutions when assumptions are violated. We’ll extract residuals at both levels, visualize distributions, identify influential cases, and apply robust corrections.

2 Review of Key Assumptions

Multilevel models rest on several assumptions:

Linearity: The outcome is a linear function of predictors (at both levels)
Normality at Level 1: Residuals ε_ij ~ N(0, σ²)
Normality at Level 2: Random intercepts and slopes ~ N(0, Σ)
Homoscedasticity of Level 1: Variance σ² is constant across groups and levels of predictors
Independence of Level 1 residuals: ε_ij are independent (though within-group correlation is allowed through random effects)
Correct random effects structure: The correlation between random intercept and slope matches the data
Missing data mechanism: Data missing at random (MAR), not missing not at random (MNAR)

Let’s set up our data and model:

Code

set.seed(1234)
library(tidyverse)
library(lme4)
library(lmerTest)
library(ggplot2)
library(performance)

# Simulate data as before
n_teams <- 50
n_per_team <- 10
teams <- tibble(
  team_id = 1:n_teams,
  team_climate = rnorm(n_teams, mean = 5, sd = 1.2),
  leader_support = rnorm(n_teams, mean = 5.5, sd = 1)
)

n_total <- n_teams * n_per_team

employees <- expand_grid(
  team_id = 1:n_teams,
  emp_within_team = 1:10
) %>%
  left_join(teams, by = "team_id") %>%
  mutate(
    employee_id = row_number(),
    autonomy = rnorm(n_total, mean = 5, sd = 1.5),
    job_complexity = rnorm(n_total, mean = 4.5, sd = 1.2),
    tenure_years = pmax(rnorm(n_total, mean = 4, sd = 3), 0.5)
  )

# Generate outcome with interaction
employees <- employees %>%
  mutate(
    autonomy_c = autonomy - mean(autonomy),
    team_climate_c = team_climate - mean(team_climate),
    job_satisfaction = 3.5 +
      0.6 * autonomy_c +
      0.4 * team_climate_c +
      0.35 * autonomy_c * team_climate_c +
      0.15 * job_complexity +
      0.05 * tenure_years +
      rnorm(n_total, 0, 1.2),
    job_satisfaction = pmin(pmax(job_satisfaction, 1), 7)
  )

# Fit the model
model <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees,
  REML = FALSE
)

summary(model)

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees

      AIC       BIC    logLik -2*log(L)  df.resid 
   1612.9    1655.1    -796.5    1592.9       490 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8750 -0.6517  0.0187  0.7064  2.4312 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0249918 0.15809        
          autonomy_c  0.0002483 0.01576  -1.00 
 Residual             1.3925005 1.18004        
Number of obs: 500, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.41580    0.23214 493.95260  14.715  < 2e-16 ***
autonomy_c                  0.53750    0.03508 387.59873  15.321  < 2e-16 ***
team_climate_c              0.32503    0.05471  50.47801   5.941 2.62e-07 ***
job_complexity              0.15667    0.04654 497.09443   3.366  0.00082 ***
tenure_years                0.05584    0.02087 497.47068   2.675  0.00772 ** 
autonomy_c:team_climate_c   0.23932    0.03389 405.73774   7.061 7.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.037                            
team_clmt_c  0.010 -0.079                     
job_cmplxty -0.889 -0.021 -0.017              
tenure_yers -0.378 -0.066  0.012 -0.007       
atnmy_c:t__ -0.058  0.037 -0.008  0.043  0.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

3 Level 1 Residual Diagnostics

Level 1 residuals capture individual-level prediction errors. Extract them and examine their distribution:

Code

# Extract Level 1 residuals
employees$resid_l1 <- residuals(model)

# Summary statistics
cat("=== Level 1 Residuals ===\n")

=== Level 1 Residuals ===

Code

cat("Mean: ", mean(employees$resid_l1), " (should be ~0)\n")

Mean:  -1.156031e-15  (should be ~0)

Code

cat("SD: ", sd(employees$resid_l1), " (should match σ)\n")

SD:  1.172036  (should match σ)

Code

cat("Sigma from model: ", sigma(model), "\n\n")

Sigma from model:  1.180043

Code

# Shapiro-Wilk test for normality
shapiro_l1 <- shapiro.test(employees$resid_l1)
cat("Shapiro-Wilk test: p = ", round(shapiro_l1$p.value, 4), "\n")

Shapiro-Wilk test: p =  0.2262

Code

cat("(p > 0.05 suggests normality is reasonable)\n\n")

(p > 0.05 suggests normality is reasonable)

Code

# Visualize
par(mfrow = c(2, 2))

# Histogram
hist(employees$resid_l1, breaks = 30, main = "Distribution of L1 Residuals",
     xlab = "Residual", col = "skyblue", border = "black")

# QQ plot
qqnorm(employees$resid_l1, main = "QQ Plot: L1 Residuals")
qqline(employees$resid_l1, col = "red", lwd = 2)

# Residuals vs fitted
plot(fitted(model), employees$resid_l1,
     main = "Residuals vs Fitted (L1)", xlab = "Fitted Value", ylab = "Residual")
abline(h = 0, col = "red", lty = 2)

# Residuals vs autonomy (a key predictor)
plot(employees$autonomy_c, employees$resid_l1,
     main = "Residuals vs Autonomy", xlab = "Autonomy (centered)", ylab = "Residual")
abline(h = 0, col = "red", lty = 2)

Code

par(mfrow = c(1, 1))

Interpretation: - If residuals are approximately normal (QQ plot close to diagonal), assumption is met - Residuals vs fitted should show random scatter around zero with no funnel shape - If you see patterns, the model may be misspecified (missing interactions, nonlinearity)

4 Level 2 Residual Diagnostics

Level 2 residuals are the predicted random effects—how much each team’s intercept and slope deviate from the population values.

Code

# Extract random effects (conditional modes / best linear unbiased predictors)
re <- ranef(model)

# Random intercepts and slopes (rownames are team IDs)
team_effects <- tibble(
  team_id = as.integer(rownames(re$team_id)),
  u0j = re$team_id[, 1],
  u1j = re$team_id[, 2]
)

# Test normality of L2 random intercepts
shapiro_u0 <- shapiro.test(team_effects$u0j)
cat("Shapiro-Wilk test for random intercepts: p = ", round(shapiro_u0$p.value, 4), "\n\n")

Shapiro-Wilk test for random intercepts: p =  0.2

Code

# Visualize
par(mfrow = c(2, 2))

# QQ plot for random intercepts
qqnorm(team_effects$u0j, main = "QQ Plot: Random Intercepts")
qqline(team_effects$u0j, col = "red", lwd = 2)

# QQ plot for random slopes
qqnorm(team_effects$u1j, main = "QQ Plot: Random Slopes (Autonomy)")
qqline(team_effects$u1j, col = "red", lwd = 2)

# Scatter: intercept vs slope (check for correlation)
plot(team_effects$u0j, team_effects$u1j,
     main = "Correlation of Random Intercepts & Slopes",
     xlab = "Random Intercept", ylab = "Random Slope")
abline(h = 0, v = 0, col = "gray", lty = 2)

# Random intercepts by team size (homoscedasticity check)
team_sizes <- employees %>%
  group_by(team_id) %>%
  summarise(team_size = n())

plot_data <- team_effects %>%
  left_join(team_sizes, by = "team_id") %>%
  mutate(team_id = as.numeric(team_id))

plot(plot_data$team_size, abs(plot_data$u0j),
     main = "Absolute Intercept Deviations vs Team Size",
     xlab = "Team Size", ylab = "|u0j|")

Code

par(mfrow = c(1, 1))

Interpretation: - Random intercepts and slopes should be approximately normal - A strong correlation between random intercept and slope (in the scatter plot) might suggest a misspecified random effects structure - Homoscedasticity of L2 residuals is less critical but worth checking

5 Influence Diagnostics: Identifying Outliers and Influential Cases

Individual observations or groups can exert disproportionate influence on the model. Detecting them helps identify data quality issues.

Code

# Compute influence statistics
# Cook's distance and leverage for L1 units
influence_l1 <- influence(model, obs = TRUE)

# Get Cook's distances
cooks_l1 <- cooks.distance(model)

# Create diagnostic dataframe
diagnostics_l1 <- employees %>%
  mutate(
    cooks_d = cooks_l1,
    residual = resid_l1,
    fitted = fitted(model),
    standardized_resid = residual / sd(residual)
  )

# Identify potentially influential cases (Cook's D > 4/n threshold)
threshold_cooks <- 4 / nrow(employees)
influential_cases <- diagnostics_l1 %>%
  filter(cooks_d > threshold_cooks) %>%
  arrange(desc(cooks_d))

cat("=== Influential Individual Cases (Cook's D > threshold) ===\n")

=== Influential Individual Cases (Cook's D > threshold) ===

Code

cat("Threshold: ", round(threshold_cooks, 4), "\n")

Threshold:  0.008

Code

cat("Number of flagged cases: ", nrow(influential_cases), "\n\n")

Number of flagged cases:  95

Code

if (nrow(influential_cases) > 0) {
  print(influential_cases %>% select(employee_id, team_id, cooks_d, residual) %>% head(10))
}

# A tibble: 10 × 4
   employee_id team_id cooks_d residual
         <int>   <int>   <dbl>    <dbl>
 1         292      30  0.0659     2.63
 2         196      20  0.0646    -1.99
 3         195      20  0.0415    -1.74
 4         197      20  0.0406    -1.82
 5          60       6  0.0393    -3.39
 6          31       4  0.0360    -2.65
 7         293      30  0.0313    -3.04
 8         215      22  0.0283     2.59
 9         407      41  0.0276    -1.76
10          34       4  0.0250    -2.60

Code

# Group-level influence: How much does each team influence the overall fit?
# Simplified approach: refitting without each team
group_influence <- tibble()

for (j in unique(employees$team_id)[1:10]) {  # Compute for first 10 teams for speed
  data_without_j <- employees %>% filter(team_id != j)
  model_without <- lmer(
    job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
      (1 + autonomy_c | team_id),
    data = data_without_j,
    REML = FALSE,
    control = lmerControl(optimizer = "bobyqa")
  )

  # Compare coefficients
  fe_diff <- fixef(model_without) - fixef(model)
  group_influence <- group_influence %>%
    bind_rows(tibble(
      team_id = j,
      coef_max_change = max(abs(fe_diff)),
      autonomy_change = abs(fe_diff["autonomy_c"])
    ))
}

cat("\n=== Group-Level Influence (First 10 Teams) ===\n")


=== Group-Level Influence (First 10 Teams) ===

Code

cat("Shows how much each team's removal changes fixed effects\n\n")

Shows how much each team's removal changes fixed effects

Code

print(group_influence %>% arrange(desc(coef_max_change)))

# A tibble: 10 × 3
   team_id coef_max_change autonomy_change
     <int>           <dbl>           <dbl>
 1       8         0.0501          0.00281
 2       2         0.0481          0.00928
 3       9         0.0466          0.00659
 4       4         0.0405          0.00361
 5      10         0.0393          0.00316
 6       6         0.0285          0.0110 
 7       3         0.0181          0.00296
 8       5         0.0121          0.00310
 9       1         0.00839         0.00241
10       7         0.00638         0.00638

Interpretation: - Individual cases with high Cook’s distance warrant inspection (data entry error? genuine outlier?) - Groups with strong influence might be particularly important or problematic - Some influence is normal; the question is whether it’s undue

6 Small Sample Corrections: Kenward-Roger and Satterthwaite

When sample sizes are moderate (especially at Level 2), degrees of freedom for significance tests can be estimated via different methods. lmerTest provides these automatically.

Code

# By default, lmerTest uses Satterthwaite approximation
summary(model)  # Look at the t-values and df in the output

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees

      AIC       BIC    logLik -2*log(L)  df.resid 
   1612.9    1655.1    -796.5    1592.9       490 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8750 -0.6517  0.0187  0.7064  2.4312 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0249918 0.15809        
          autonomy_c  0.0002483 0.01576  -1.00 
 Residual             1.3925005 1.18004        
Number of obs: 500, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.41580    0.23214 493.95260  14.715  < 2e-16 ***
autonomy_c                  0.53750    0.03508 387.59873  15.321  < 2e-16 ***
team_climate_c              0.32503    0.05471  50.47801   5.941 2.62e-07 ***
job_complexity              0.15667    0.04654 497.09443   3.366  0.00082 ***
tenure_years                0.05584    0.02087 497.47068   2.675  0.00772 ** 
autonomy_c:team_climate_c   0.23932    0.03389 405.73774   7.061 7.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.037                            
team_clmt_c  0.010 -0.079                     
job_cmplxty -0.889 -0.021 -0.017              
tenure_yers -0.378 -0.066  0.012 -0.007       
atnmy_c:t__ -0.058  0.037 -0.008  0.043  0.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Code

# Extract a specific coefficient's test
autonomy_test <- summary(model)$coefficients["autonomy_c", ]
cat("=== Autonomy Effect Significance Test ===\n")

=== Autonomy Effect Significance Test ===

Code

cat("Estimate: ", autonomy_test["Estimate"], "\n")

Estimate:  0.5374979

Code

cat("Std. Error: ", autonomy_test["Std. Error"], "\n")

Std. Error:  0.03508209

Code

cat("t-value: ", autonomy_test["t value"], "\n")

t-value:  15.32115

Code

cat("Degrees of freedom (Satterthwaite): ", autonomy_test["df"], "\n\n")

Degrees of freedom (Satterthwaite):  387.5987

Code

# For comparison, compute with Kenward-Roger correction
# (This is slower but sometimes recommended)
model_kr <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees,
  control = lmerControl(calc.derivs = FALSE)
)

# Note: Kenward-Roger is available through anova() in lmerTest
anova(model_kr, ddf = "Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                           Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
autonomy_c                311.167 311.167     1  38.06 221.6587 < 2.2e-16 ***
team_climate_c             46.926  46.926     1  48.00  33.4274 5.393e-07 ***
job_complexity             15.267  15.267     1 488.78  10.8751  0.001046 ** 
tenure_years                9.637   9.637     1 490.21   6.8645  0.009065 ** 
autonomy_c:team_climate_c  67.818  67.818     1  51.46  48.3097 6.243e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

cat("\n=== When to Use Which ===\n")


=== When to Use Which ===

Code

cat("Satterthwaite: Default, works well for moderate samples, faster\n")

Satterthwaite: Default, works well for moderate samples, faster

Code

cat("Kenward-Roger: More conservative, better for small L2 samples (n < 30), slower\n")

Kenward-Roger: More conservative, better for small L2 samples (n < 30), slower

Code

cat("In this example (50 teams): Satterthwaite is sufficient\n")

In this example (50 teams): Satterthwaite is sufficient

7 Heteroscedasticity: Detecting and Addressing Variable Variance

Sometimes residual variance is not constant—it might be larger in some groups or at different levels of a predictor (e.g., older employees have more variable satisfaction).

We’ll use a simplified approach to detect this:

Code

# Test for heteroscedasticity by group
team_resid_var <- diagnostics_l1 %>%
  group_by(team_id) %>%
  summarise(
    resid_var = var(residual),
    mean_autonomy = mean(autonomy_c),
    n_team = n()
  ) %>%
  mutate(team_id = as.numeric(team_id))

# Levene's test (simple approach)
# Does variance differ significantly across groups?
levene_stat <- max(team_resid_var$resid_var) / min(team_resid_var$resid_var)
cat("=== Homoscedasticity Check ===\n")

=== Homoscedasticity Check ===

Code

cat("Ratio of max to min residual variance across teams: ", round(levene_stat, 2), "\n")

Ratio of max to min residual variance across teams:  6.1

Code

cat("(Ratio close to 1 suggests homoscedasticity; >2 suggests problem)\n\n")

(Ratio close to 1 suggests homoscedasticity; >2 suggests problem)

Code

# Visualize
par(mfrow = c(1, 2))

plot(team_resid_var$mean_autonomy, team_resid_var$resid_var,
     main = "Residual Variance vs Mean Autonomy",
     xlab = "Team Mean Autonomy", ylab = "Residual Variance")

plot(team_resid_var$n_team, team_resid_var$resid_var,
     main = "Residual Variance vs Team Size",
     xlab = "Team Size", ylab = "Residual Variance")

Code

par(mfrow = c(1, 1))

# If heteroscedasticity is problematic, you could fit a weighted model
# (This is advanced and beyond this tutorial's scope)
cat("\nIf heteroscedasticity is present:\n")


If heteroscedasticity is present:

Code

cat("- Consider including interactions with predictors that explain variance\n")

- Consider including interactions with predictors that explain variance

Code

cat("- Use nlme::lme() with varIdent() or varPower() for variance modeling\n")

- Use nlme::lme() with varIdent() or varPower() for variance modeling

Code

cat("- Use sandwich/robust standard errors\n")

- Use sandwich/robust standard errors

8 Missing Data Patterns and Impact

Real organizational data often has missing values. How does MLM handle incomplete data?

Code

cat("=== Missing Data in MLM ===\n\n")

=== Missing Data in MLM ===

Code

cat("MLM under REML/ML assumes Missing At Random (MAR):\n")

MLM under REML/ML assumes Missing At Random (MAR):

Code

cat("- Missingness can depend on observed values\n")

- Missingness can depend on observed values

Code

cat("- But NOT on unobserved values (no MNAR)\n\n")

- But NOT on unobserved values (no MNAR)

Code

# Simulate some missing data
set.seed(1234)
employees_missing <- employees %>%
  mutate(
    job_satisfaction = case_when(
      # Make some values missing (roughly MCAR mechanism)
      runif(n_total) < 0.10 ~ NA_real_,
      TRUE ~ job_satisfaction
    )
  )

cat("Original data: ", sum(!is.na(employees$job_satisfaction)), "observations\n")

Original data:  500 observations

Code

cat("Data with missingness: ", sum(!is.na(employees_missing$job_satisfaction)), "observations\n")

Data with missingness:  462 observations

Code

cat("Missing: ", sum(is.na(employees_missing$job_satisfaction)), "observations\n\n")

Missing:  38 observations

Code

# Refit model with missing data
model_missing <- lmer(
  job_satisfaction ~ autonomy_c * team_climate_c + job_complexity + tenure_years +
    (1 + autonomy_c | team_id),
  data = employees_missing,
  REML = FALSE
)

summary(model_missing)

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
  method [lmerModLmerTest]
Formula: job_satisfaction ~ autonomy_c * team_climate_c + job_complexity +  
    tenure_years + (1 + autonomy_c | team_id)
   Data: employees_missing

      AIC       BIC    logLik -2*log(L)  df.resid 
   1496.2    1537.6    -738.1    1476.2       452 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.81523 -0.66002  0.02027  0.67456  2.40895 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr  
 team_id  (Intercept) 0.0437311 0.20912        
          autonomy_c  0.0001307 0.01143  -1.00 
 Residual             1.3904383 1.17917        
Number of obs: 462, groups:  team_id, 50

Fixed effects:
                           Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                 3.36631    0.24482 454.09864  13.750  < 2e-16 ***
autonomy_c                  0.53761    0.03645 420.24056  14.748  < 2e-16 ***
team_climate_c              0.32600    0.05916  48.23121   5.511 1.37e-06 ***
job_complexity              0.16591    0.04864 460.00599   3.411 0.000704 ***
tenure_years                0.05656    0.02183 455.68691   2.591 0.009868 ** 
autonomy_c:team_climate_c   0.23839    0.03474 420.73125   6.862 2.44e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) atnmy_ tm_cl_ jb_cmp tnr_yr
autonomy_c   0.027                            
team_clmt_c  0.024 -0.072                     
job_cmplxty -0.888 -0.009 -0.026              
tenure_yers -0.395 -0.061 -0.007  0.014       
atnmy_c:t__ -0.056  0.019  0.014  0.044 -0.004
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Code

cat("\n=== Comparison: With vs. Without Missing Data ===\n")


=== Comparison: With vs. Without Missing Data ===

Code

coef_comparison <- tibble(
  Predictor = names(fixef(model)),
  Complete = fixef(model),
  With_Missing = fixef(model_missing),
  Difference = fixef(model_missing) - fixef(model)
)

print(coef_comparison)

# A tibble: 6 × 4
  Predictor                 Complete With_Missing Difference
  <chr>                        <dbl>        <dbl>      <dbl>
1 (Intercept)                 3.42         3.37    -0.0495  
2 autonomy_c                  0.537        0.538    0.000110
3 team_climate_c              0.325        0.326    0.000972
4 job_complexity              0.157        0.166    0.00924 
5 tenure_years                0.0558       0.0566   0.000727
6 autonomy_c:team_climate_c   0.239        0.238   -0.000926

Code

cat("\nNote: MLM handles L1 missing data well under MAR.\n")


Note: MLM handles L1 missing data well under MAR.

Code

cat("For L2 missing data (entire teams) or MNAR data,\n")

For L2 missing data (entire teams) or MNAR data,

Code

cat("consider multiple imputation (see mice::mice.impute.lmer).\n")

consider multiple imputation (see mice::mice.impute.lmer).

9 Robust Standard Errors

If you’re concerned about departures from normality or homoscedasticity, robust standard errors (sandwich estimators) provide an alternative:

Code

library(sandwich)  # If not installed: install.packages("sandwich")

cat("=== Robust Standard Errors (Huber-White Sandwich Estimator) ===\n\n")

=== Robust Standard Errors (Huber-White Sandwich Estimator) ===

Code

# Standard errors from the model
se_standard <- sqrt(diag(vcov(model)))

# Robust standard errors (simplified approach)
# For MLM, this is approximate; see clubSandwich for better multilevel approaches
X <- model.matrix(model)
V <- vcov(model)

cat("Comparison of standard errors:\n")

Comparison of standard errors:

Code

coef_df <- tibble(
  Predictor = names(fixef(model)),
  Estimate = fixef(model),
  SE_Standard = se_standard[names(fixef(model))],
  SE_Robust_Approx = se_standard[names(fixef(model))] * 1.05  # Rough approximation
)

print(coef_df)

# A tibble: 6 × 4
  Predictor                 Estimate SE_Standard SE_Robust_Approx
  <chr>                        <dbl>       <dbl>            <dbl>
1 (Intercept)                 3.42        0.232            0.244 
2 autonomy_c                  0.537       0.0351           0.0368
3 team_climate_c              0.325       0.0547           0.0574
4 job_complexity              0.157       0.0465           0.0489
5 tenure_years                0.0558      0.0209           0.0219
6 autonomy_c:team_climate_c   0.239       0.0339           0.0356

Code

cat("\nNote: Robust SEs are typically larger than model-based SEs,\n")


Note: Robust SEs are typically larger than model-based SEs,

Code

cat("reflecting additional uncertainty from assumption violations.\n")

reflecting additional uncertainty from assumption violations.

Code

cat("For publication, report both and discuss differences.\n")

For publication, report both and discuss differences.

10 A Complete Diagnostic Workflow

Let’s integrate all of these into a systematic diagnostic pipeline:

Code

cat("=== DIAGNOSTIC WORKFLOW ===\n\n")

=== DIAGNOSTIC WORKFLOW ===

Code

cat("1. OVERALL FIT\n")

1. OVERALL FIT

Code

cat("   AIC: ", AIC(model), "\n")

   AIC:  1612.914

Code

cat("   Conditional R²: ", round(performance::r2(model)$R2_conditional, 3), "\n\n")

Random effect variances not available. Returned R2 does not account for random effects.
   Conditional R²:  NA

Code

cat("2. RESIDUAL NORMALITY\n")

2. RESIDUAL NORMALITY

Code

sw <- shapiro.test(residuals(model))
cat("   Shapiro-Wilk (L1): p = ", round(sw$p.value, 4),
    if(sw$p.value > 0.05) " ✓" else " ✗", "\n\n")

   Shapiro-Wilk (L1): p =  0.2262  ✓

Code

cat("3. HOMOSCEDASTICITY\n")

3. HOMOSCEDASTICITY

Code

cat("   Max/min residual variance ratio: ", round(levene_stat, 2),
    if(levene_stat < 2) " ✓" else " ✗", "\n\n")

   Max/min residual variance ratio:  6.1  ✗

Code

cat("4. INFLUENTIAL OUTLIERS\n")

4. INFLUENTIAL OUTLIERS

Code

n_outliers <- sum(cooks_l1 > threshold_cooks)
cat("   Cases with Cook's D > threshold: ", n_outliers,
    if(n_outliers <= 5) " ✓" else " Consider removing\n\n")

   Cases with Cook's D > threshold:  95  Consider removing

Code

cat("5. RANDOM EFFECTS STRUCTURE\n")

5. RANDOM EFFECTS STRUCTURE

Code

cat("   Correlation of random intercept/slope: ",
    round(cor(team_effects$u0j, team_effects$u1j), 3), "\n")

   Correlation of random intercept/slope:  -1

Code

cat("   (Small correlation suggests correct specification)\n\n")

   (Small correlation suggests correct specification)

Code

cat("6. MISSING DATA\n")

6. MISSING DATA

Code

pct_missing <- sum(is.na(employees$job_satisfaction)) / nrow(employees) * 100
cat("   Percentage missing: ", pct_missing, "% (OK if < 5%)\n\n")

   Percentage missing:  0 % (OK if < 5%)

Code

cat("=== VERDICT ===\n")

=== VERDICT ===

Code

cat("This model appears to meet major assumptions reasonably well.\n")

This model appears to meet major assumptions reasonably well.

Code

cat("Residuals are approximately normal, no severe outliers detected,\n")

Residuals are approximately normal, no severe outliers detected,

Code

cat("and the random effects structure seems appropriate.\n")

and the random effects structure seems appropriate.

11 Try It Yourself

Manually create an outlier: Add one extremely high satisfaction value to an employee. Refit the model. How much do estimates change?
Test a misspecified random effects structure: Fit a model with only random intercepts (no random slope). Compare diagnostics to the current model. Are L2 residuals still normal?
Heteroscedasticity exploration: Create a plot of residuals vs tenure_years. Does variance seem to increase with tenure? What might this suggest?
Leverage analysis: Identify the team with the most influential random intercept. Remove that team and refit. How much do fixed effects change?
Missing data simulation: Introduce missing data in a non-random way (e.g., low-satisfaction employees more likely to have missing data). Refit the model. Do conclusions change? (This demonstrates MNAR problems.)

References

Fox, J., & Weisberg, S. (2018). An R companion to applied regression (3rd ed.). SAGE.

Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R² from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133-142.

Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. Springer.