Random Slopes and Group Sizes
This is a note on random slopes and group sizes. Specifically, I want to show that when group sizes are balanced, including a random slope is unlikely to meaningfullly change the estimated coefficient on the fixed effect. On the other hand, when the group sizes are extremely imbalanced, including random slopes will drastically change the estimated fixed slop.
This effect is primarly due to the different interpretation of the fixed effect coefficient when we include random slopes.
Consider two models for data with observations nested in groups:
Random intercepts only:
Random intercepts and slopes:
The key difference is in the interpretation of β1. In the first model, β1 is a weighted average where groups with more observations have more influence. In the second model, β1 is a simple average of the group-specific slopes, with each group contributing equally regardless of its size.
In other words, without random slopes, the fixed effect coefficient
represents the average slope. With random slopes, the fixed effect
coefficient represents the average
Simulation
Balanced Groups
We'll simulate data from a linear model. For simplicity, we'll keep to two groups, and then fit the two models described above. We'll then compare the estimated coefficients. First we'll let the groups have balanced sample size.
n <- 1000
g <- c(rep(0, n/2), rep(1, n/2))
x <- rnorm(n)
y <- x + rnorm(n)
y[g == 1] <- -5*x[g == 1] + rnorm(5)
The true slope in group 0 is 1, and in group 1 is -5.
estimated_coeffs <- vector(length = 3)
# fixed effects for g
mod1 <- lm(y ~ x + g)
estimated_coeffs[1] <- mod1$coefficients["x"]
library(lme4)
# random effects for g
mod2 <- lmer(y ~ x + (1 | g))
estimated_coeffs[2] <- fixef(mod2)["x"]
# Random slope for x on g as well
mod3 <- lmer(y ~ x + (1 + x | g))
estimated_coeffs[3] <- fixef(mod3)["x"]
data.frame(model = c("Fixed Effects",
"Random Intercepts",
"Random Slopes and Intercepts"),
estimate = estimated_coeffs)
model estimate
1 Fixed Effects -1.992879
2 Random Intercepts -1.991755
3 Random Slopes and Intercepts -1.965415
We see that all three models estimate approximately the same fixed
effect coefficient for x
- around -2, which is the average
of the true slopes of 1 and -5.
Imbalanced Groups
Now, let's repeat the simulation with a large imbalance in the size of each group. We'll keep the same true slope in each group.
n <- 1000
g <- c(rep(0, n - 20), rep(1, 20))
x <- rnorm(n)
y <- x + rnorm(n)
y[g == 1] <- -5*x[g == 1] + rnorm(5)
estimated_coeffs <- vector(length = 3)
# Only fixed effects
mod1 <- lm(y ~ x + g)
estimated_coeffs[1] <- mod1$coefficients["x"]
library(lme4)
# Add random effect for g
mod2 <- lmer(y ~ x + (1 | g))
estimated_coeffs[2] <- fixef(mod2)["x"]
# Random slope for x on g as well
mod3 <- lmer(y ~ x + (1 + x | g)
data.frame(model = c("Fixed Effects",
"Random Intercepts",
"Random Slopes and Intercepts"),
estimate = estimated_coeffs)
model estimate
1 Fixed Effects 0.8922942
2 Random Intercepts 0.8920293
3 Random Slopes and Intercepts -2.1062616
Now, since the first group is so large, the fixed effects and random intercepts models both are dominated by the influence of group 0. As a result, they both estimate a slope that is close to 1. The random slopes model, on the other hand, estimates a slope that averages the two slopes of 1 and -5, and therefore estimates a much lower slope, closer to the average of the -2.