Log Transform Interpretation
Introduction
Log transformations can be useful when a variable is very right-skewed, or multiplicative effects are desired over additive. However, interpretation can be challenging.
We are always discussing the natural log (ln), i.e. loge .
Multiplicative vs Percent Change
Note that multiplicative changes can be expressed as percent changes and vice-versa. For example, multiplying a value by 1.3 is equivalent to increasing the value by 30%, or conversely, decreasing a value by 15% is equivalent to multiplying it by .85.
Logged Outcome
A 1 unit change in a predictor is associated with a multiplicative change in Y, or a % change in Y.
Examples:
- If is .2, a 1-unit increase in X is associated with an multiplicative change in Y, or a 22% increase.
- If is -.4 a 1-unit increase in X is associated with an multiplicative change in Y, or a 33% decrease.
Theory
Assume our regression equation is
If we regress on the log of Y instead,
By Taylor expansion,
Therefore we can write
Example
data(mtcars)
(mod1 <- lm(log(disp) ~ drat, data = mtcars))
Call: lm(formula = log(disp) ~ drat, data = mtcars) Coefficients: (Intercept) drat 8.2782 -0.8323
Therefore a 1-unit increase in drat
is associated
with an
multiplicative change in disp
, corresponding to a
56.5% decrease.
To test this, we predict the ratio in predicted outcome with some
values of drat
, and that value increased by
1. Note: We exponentiate the predicted values
to get them on the outcome scale.
exp(predict(mod1, newdata = data.frame(drat = 5)))/exp(predict(mod1, newdata = data.frame(drat = 4)))
1
0.4350567
Repeat with different values of drat
to show that all
that matters is the change in the predictor, not its starting value.
exp(predict(mod1, newdata = data.frame(drat = 5)))/exp(predict(mod1, newdata = data.frame(drat = 4)))
1
0.4350567
Visualization
We can visualize this relationship by basic plotting commands. First,
generate predicted values. We will do this by generating an artificial
X variable (drat
) spaced over its range, obtaining
predicted values, and then exponentiating them.
new_drat <- seq(min(mtcarsdrat),
max(mtcarsdrat),
length.out = 100)
yhat1 <- exp(predict(mod1, newdata = data.frame(drat = new_drat)))
Take note of the call to exp()
in the line defining
yhat1
.
Next, we can plot the best fit line, overlaying on top of the observed values.
plot(yhat1 ~ new_drat, type = "l")
with(mtcars, points(disp ~ drat))
Logged Predictor
A k% change in a predictor is associated with change in the outcome.
Examples:
- If is 2, a 10% increase in X is associated with a increase in Y.
- If is -1.51, a 5% increase in X is associated with a decrease in Y.
- If is .75, a 5% decrease in X is associated with a decrease in Y.
Theory
Assume our regression equation is
If we include instead, we have
Consider when where is some constant (e.g. 2 for a doubling of X or 1.3 for a 30% increase in X).
Therefore if we look at the difference in expectation,
Approximation
If your percent change is small (a few percent) then you can approximate the change. This is because when is close to 0. So to approximate what effect a 1% change in X would have, simply multiply by that value: . This works reliably well up to , moderately up to and gets much worse beyond that.
Example
data(mtcars)
(mod2 <- lm(disp ~ log(drat), data = mtcars))
Call:
lm(formula = drat ~ log(disp), data = mtcars)
Coefficients:
(Intercept) log(disp)
7.2301 -0.6875
Therefore a 25% increase in disp
is associated
with a
change in drat
.
To test this, we predict the difference in predicted outcome with some
values of disp
, and that value increased by25%.
predict(mod2, newdata = data.frame(disp = 5)) - predict(mod2, newdata = data.frame(disp = 5*1.25))
1 0.1534182
predict(mod2, newdata = data.frame(disp = 11)) - predict(mod2, newdata = data.frame(disp = 11*1.25))
1 0.1534182
Visualization
We'll do a similar plot, except we'll let R handle the logging and
exponentiating. We'll re-use new_drat
created in the
earlier example.
yhat2 <- predict(mod2, newdata = data.frame(drat = new_drat))
plot(yhat2 ~ new_drat, type = "l")
with(mtcars, points(disp ~ drat))
Both Logged
A k% change in a predictor is associated with a multiplicative change in the outcome.
Examples:
- If is 2, a 10% increase in X is associated with a increase in Y.
- If is -1.5, a 20% decrease in X is associated with a increase in Y.
Theory
To-do.
Example
data(mtcars) (mod3 <- lm(log(disp) ~ log(drat), data = mtcars))
Call: lm(formula = log(drat) ~ log(disp), data = mtcars) Coefficients: (Intercept) log(disp) 2.2763 -0.1905
Therefore a 25% increase in disp
is associated
with a
multiplicative change in drat
, corresponding to a
4.2% decrease.
To test this, we predict the difference in predicted outcome with
some values of disp
, and that value increased by
25%.
predict(mod3, newdata = data.frame(disp = 5)) - predict(mod3, newdata = data.frame(disp = 5*1.25))
1 0.04251857
predict(mod3, newdata = data.frame(disp = 8)) - predict(mod3, newdata = data.frame(disp = 8*1.25))
1 0.04251857
Visualization
Again, a similar plot, letting R handle the log in the predictor,
and we'll manually exponentiate the outcome. Again, re-use
new_drat
created in the
earlier example.
yhat3 <- exp(predict(mod3, newdata = data.frame(drat = new_drat)))
plot(yhat3 ~ new_drat, type = "l")
with(mtcars, points(disp ~ drat))
Compare Visualizations
The three plots can look very similar, let's plot them simultaneously just to show that they are in fact three different curves.
with(mtcars, plot(disp ~ drat))
lines(yhat1 ~ new_drat, col = "red")
lines(yhat2 ~ new_drat, col = "blue")
lines(yhat3 ~ new_drat, col = "green")