When can Poisson Regression approximate Logistic?
An older idea in Epidemiology is to use a Poisson regression model in place of a logistic regression model. This idea has some validity because with a low mean, the Poisson distribution approximates the binary distribution.
Simulation
Let’s examine this. First, we define a program which generates some data according to a logistic model, then fits both logistic and Poisson regression models against it.
This program, defined below, takes in three arguments
n
- Sample sizep
- Baseline probability of successb1
- Coefficient of interest.
The model is simply
\[ logit(P(Y = 1 | X)) = logit(p) + b_1x \]
program def binsim, rclass
. drop _all
1. args n p b1
2. set obs `n'
3. gen x = rnormal()
4. gen y = rbinomial(1, invlogit(logit(`p') + `b1'*x))
5.
6. * Return P(success) to ensure everything is workingmean y
. e(b)
7. mat b = scalar pp = b[1,1]
8. return scalar pp=pp
9.
10. model
. * Poisson poisson y x
. e(b)
11. mat b = scalar b_pois = b[1,1]
12. return scalar b_pois=b_pois
13.
14. model
. * Logistic logistic y x
. e(b)
15. mat b = scalar b_logit = b[1,1]
16. return scalar b_logit=b_logit
17. end 18.
Results
Now we can run it with a few different settings. Specifically, we’re interested in how close to 0 the mean must be for the Poisson coefficient to approximate the logistic coefficient.
Set a few parameters
local beta1 .4
.
local reps 1000 .
simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
. `reps') nodots: binsim 10000 .1 `beta1'
> reps(
Command: binsim 10000 .1 .4r(pp)
pp: r(b_pois)
b_pois: r(b_logit)
b_logit:
First we’ll ensure the code is working and that P(success) is 10% as requested.
mean pp
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------
pp | .1057365 .0000978 .1055445 .1059285 --------------------------------------------------------------
Now we can look at the kernel densities
twoway kdensity b_logit || kdensity b_pois, ///
. xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson")) >
The Poisson coefficient is strongly negatively biased. We can estimate the bias as a percent of the true coefficient.
gen error = abs(b_logit - b_pois)/b_logit
.
mean error
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------error | .1195278 .0001314 .1192699 .1197857
--------------------------------------------------------------
simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
. `reps') nodots: binsim 10000 .05 `beta1'
> reps(
Command: binsim 10000 .05 .4r(pp)
pp: r(b_pois)
b_pois: r(b_logit)
b_logit:
mean pp
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------
pp | .0535497 .0000704 .0534115 .0536879
--------------------------------------------------------------
twoway kdensity b_logit || kdensity b_pois, ///
. xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))
>
gen error = abs(b_logit - b_pois)/b_logit
.
mean error
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------error | .0616367 .0001023 .0614361 .0618374
--------------------------------------------------------------
Still see a negative bias.
simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
. `reps') nodots: binsim 10000 .03 `beta1'
> reps(
Command: binsim 10000 .03 .4r(pp)
pp: r(b_pois)
b_pois: r(b_logit)
b_logit:
mean pp
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------
pp | .0322263 .0000573 .0321139 .0323387
--------------------------------------------------------------
twoway kdensity b_logit || kdensity b_pois, ///
. xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))
>
gen error = abs(b_logit - b_pois)/b_logit
.
mean error
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------error | .0375324 .0000839 .0373677 .0376971
--------------------------------------------------------------
The bias is minimal but still present.
simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
. `reps') nodots: binsim 10000 .01 `beta1'
> reps(
Command: binsim 10000 .01 .4r(pp)
pp: r(b_pois)
b_pois: r(b_logit)
b_logit:
mean pp
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------
pp | .0108073 .0000342 .0107402 .0108744
--------------------------------------------------------------
twoway kdensity b_logit || kdensity b_pois, ///
. xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))
>
gen error = abs(b_logit - b_pois)/b_logit
.
mean error
.
of obs = 1,000
Mean estimation Number
--------------------------------------------------------------
| Mean Std. err. [95% conf. interval]
-------------+------------------------------------------------error | .0127062 .0000505 .0126072 .0128053
--------------------------------------------------------------
The bias has all but disappeared.
Conclusion
I wouldn’t recommend using Poisson over Logistic unless P(Success) was 1% or less.
There’s an interesting artifact to explore, namely that the percent bias is consistently about 15-20% larger than the true beta.