Random Slopes and Group Sizes

This is a note on random slopes and group sizes. Specifically, I want to show that when group sizes are balanced, including a random slope is unlikely to meaningfully change the estimated coefficient on the fixed effect. On the other hand, when the group sizes are extremely imbalanced, including random slopes will drastically change the estimated fixed slop.

This effect is primarly due to the different interpretation of the fixed effect coefficient when we include random slopes.

Consider two models for data with observations nested in groups:

Random intercepts only:
$y_{i j} = (β_{0} + u_{0 j}) + β_{1} x_{i j} + ε_{i j}$

Random intercepts and slopes:
$y_{i j} = (β_{0} + u_{0 j}) + (β_{1} + u_{1 j}) x_{i j} + ε_{i j}$

The key difference is in the interpretation of β₁. In the first model, β₁ is a weighted average where groups with more observations have more influence. In the second model, β₁ is a simple average of the group-specific slopes, with each group contributing equally regardless of its size.

In other words, without random slopes, the fixed effect coefficient represents the average slope. With random slopes, the fixed effect coefficient represents the average of the slopes.

Simulation

Balanced Groups

We'll simulate data from a linear model. For simplicity, we'll keep to two groups, and then fit the two models described above. We'll then compare the estimated coefficients. First we'll let the groups have balanced sample size.

n <- 1000
g <- c(rep(0, n/2), rep(1, n/2))
x <- rnorm(n)
y <- x + rnorm(n)
y[g == 1] <- -5*x[g == 1] + rnorm(5)

The true slope in group 0 is 1, and in group 1 is -5.

estimated_coeffs <- vector(length = 3)

# fixed effects for g
mod1 <- lm(y ~ x + g)
estimated_coeffs[1] <- mod1$coefficients["x"]

library(lme4)
# random effects for g
mod2 <- lmer(y ~ x + (1 | g))
estimated_coeffs[2] <- fixef(mod2)["x"]

# Random slope for x on g as well
mod3 <- lmer(y ~ x + (1 + x | g))
estimated_coeffs[3] <- fixef(mod3)["x"]

data.frame(model = c("Fixed Effects",
                     "Random Intercepts",
                     "Random Slopes and Intercepts"),
           estimate = estimated_coeffs)

                         model  estimate
1                Fixed Effects -1.992879
2            Random Intercepts -1.991755
3 Random Slopes and Intercepts -1.965415

We see that all three models estimate approximately the same fixed effect coefficient for x - around -2, which is the average of the true slopes of 1 and -5.

Imbalanced Groups

Now, let's repeat the simulation with a large imbalance in the size of each group. We'll keep the same true slope in each group.

n <- 1000
g <- c(rep(0, n - 20), rep(1, 20))
x <- rnorm(n)
y <- x + rnorm(n)
y[g == 1] <- -5*x[g == 1] + rnorm(5)

estimated_coeffs <- vector(length = 3)

# Only fixed effects
mod1 <- lm(y ~ x + g)
estimated_coeffs[1] <- mod1$coefficients["x"]

library(lme4)
# Add random effect for g
mod2 <- lmer(y ~ x + (1 | g))
estimated_coeffs[2] <- fixef(mod2)["x"]

# Random slope for x on g as well
mod3 <- lmer(y ~ x + (1 + x | g)

data.frame(model = c("Fixed Effects",
                     "Random Intercepts",
                     "Random Slopes and Intercepts"),
          estimate = estimated_coeffs)

                         model   estimate
1                Fixed Effects  0.8922942
2            Random Intercepts  0.8920293
3 Random Slopes and Intercepts -2.1062616

Now, since the first group is so large, the fixed effects and random intercepts models both are dominated by the influence of group 0. As a result, they both estimate a slope that is close to 1. The random slopes model, on the other hand, estimates a slope that averages the two slopes of 1 and -5, and therefore estimates a much lower slope, closer to the average of the -2.