When can Poisson Regression approximate Logistic?

An older idea in Epidemiology is to use a Poisson regression model in place of a logistic regression model. This idea has some validity because with a low mean, the Poisson distribution approximates the binary distribution.

Simulation

Let’s examine this. First, we define a program which generates some data according to a logistic model, then fits both logistic and Poisson regression models against it.

This program, defined below, takes in three arguments

  • n - Sample size
  • p - Baseline probability of success
  • b1 - Coefficient of interest.

The model is simply

\[ logit(P(Y = 1 | X)) = logit(p) + b_1x \]

. program def binsim, rclass
  1.     drop _all
  2.     args n p b1
  3.     set obs `n'
  4.     gen x = rnormal()
  5.     gen y = rbinomial(1, invlogit(logit(`p') + `b1'*x))
  6.     * Return P(success) to ensure everything is working
.     mean y
  7.     mat b = e(b)
  8.     scalar pp = b[1,1]
  9.     return scalar pp=pp
 10. 
.     * Poisson model
.     poisson y x
 11.     mat b = e(b)
 12.     scalar b_pois = b[1,1]
 13.     return scalar b_pois=b_pois
 14. 
.     * Logistic model
.     logistic y x
 15.     mat b = e(b)
 16.     scalar b_logit = b[1,1]
 17.     return scalar b_logit=b_logit
 18. end

Results

Now we can run it with a few different settings. Specifically, we’re interested in how close to 0 the mean must be for the Poisson coefficient to approximate the logistic coefficient.

Set a few parameters

. local beta1 .4

. local reps 1000
. simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
>     reps(`reps') nodots: binsim 10000 .1 `beta1'

      Command: binsim 10000 .1 .4
           pp: r(pp)
       b_pois: r(b_pois)
      b_logit: r(b_logit)

First we’ll ensure the code is working and that P(success) is 10% as requested.

. mean pp

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          pp |   .1057365   .0000978      .1055445    .1059285
--------------------------------------------------------------

Now we can look at the kernel densities

. twoway kdensity b_logit || kdensity b_pois, ///
>     xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))

The Poisson coefficient is strongly negatively biased. We can estimate the bias as a percent of the true coefficient.

. gen error = abs(b_logit - b_pois)/b_logit

. mean error

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
       error |   .1195278   .0001314      .1192699    .1197857
--------------------------------------------------------------
. simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
>     reps(`reps') nodots: binsim 10000 .05 `beta1'

      Command: binsim 10000 .05 .4
           pp: r(pp)
       b_pois: r(b_pois)
      b_logit: r(b_logit)


. mean pp

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          pp |   .0535497   .0000704      .0534115    .0536879
--------------------------------------------------------------

. twoway kdensity b_logit || kdensity b_pois, ///
>     xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))

. gen error = abs(b_logit - b_pois)/b_logit

. mean error

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
       error |   .0616367   .0001023      .0614361    .0618374
--------------------------------------------------------------

Still see a negative bias.

. simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
>     reps(`reps') nodots: binsim 10000 .03 `beta1'

      Command: binsim 10000 .03 .4
           pp: r(pp)
       b_pois: r(b_pois)
      b_logit: r(b_logit)


. mean pp

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          pp |   .0322263   .0000573      .0321139    .0323387
--------------------------------------------------------------

. twoway kdensity b_logit || kdensity b_pois, ///
>     xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))

. gen error = abs(b_logit - b_pois)/b_logit

. mean error

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
       error |   .0375324   .0000839      .0373677    .0376971
--------------------------------------------------------------

The bias is minimal but still present.

. simulate pp=r(pp) b_pois=r(b_pois) b_logit=r(b_logit), ///
>     reps(`reps') nodots: binsim 10000 .01 `beta1'

      Command: binsim 10000 .01 .4
           pp: r(pp)
       b_pois: r(b_pois)
      b_logit: r(b_logit)


. mean pp

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          pp |   .0108073   .0000342      .0107402    .0108744
--------------------------------------------------------------

. twoway kdensity b_logit || kdensity b_pois, ///
>     xline(`beta1') legend(label(1 "Logistic") label(2 "Poisson"))

. gen error = abs(b_logit - b_pois)/b_logit

. mean error

Mean estimation                          Number of obs = 1,000

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
       error |   .0127062   .0000505      .0126072    .0128053
--------------------------------------------------------------

The bias has all but disappeared.

Conclusion

I wouldn’t recommend using Poisson over Logistic unless P(Success) was 1% or less.

There’s an interesting artifact to explore, namely that the percent bias is consistently about 15-20% larger than the true beta.