One of the main assumptions of linear models such as linear regression and analysis of variance is that the residual errors follow a normal distribution. To meet this assumption when a continuous response variable is skewed, a transformation of the response variable can produce errors that are approximately normal. Often, however, the response variable of interest is categorical or discrete, not continuous. In this case, a simple transformation cannot produce normally distributed errors.

A common example is when the response variable is the counted number of occurrences of an event. The distribution of counts is discrete, not continuous, and is limited to non-negative values. There are two problems with applying an ordinary linear regression model to these data. First, many distributions of count data are positively skewed with many observations in the data set having a value of 0. The high number of 0’s in the data set prevents the transformation of a skewed distribution into a normal one. Second, it is quite likely that the regression model will produce negative predicted values, which are theoretically impossible.

An example of a regression model with a count response variable is the prediction of the number of times a person perpetrated domestic violence against his or her partner in the last year based on whether he or she had witnessed domestic violence as a child and who the perpetrator of that violence was. Because many individuals in the sample had not perpetrated violence at all, many observations had a value of 0, and any attempts to transform the data to a normal distribution failed.

An alternative is to use a Poisson regression model or one of its variants. These models have a number of advantages over an ordinary linear regression model, including a skew, discrete distribution, and the restriction of predicted values to non-negative numbers. A Poisson model is similar to an ordinary linear regression, with two exceptions. First, it assumes that the errors follow a Poisson, not a normal, distribution. Second, rather than modeling Y as a linear function of the regression coefficients, it models the natural log of the response variable, ln(Y), as a linear function of the coefficients.

The Poisson model assumes that the mean and variance of the errors are equal. But usually in practice the variance of the errors is larger than the mean (although it can also be smaller). When the variance is larger than the mean, there are two extensions of the Poisson model that work well. In the over-dispersed Poisson model, an extra parameter is included which estimates how much larger the variance is than the mean. This parameter estimate is then used to correct for the effects of the larger variance on the p-values. An alternative is a negative binomial model. The negative binomial distribution is a form of the Poisson distribution in which the distribution’s parameter is itself considered a random variable. The variation of this parameter can account for a variance of the data that is higher than the mean.

A negative binomial model proved to fit well for the domestic violence data described above. Because the majority of individuals in the data set perpetrated 0 times, but a few individuals perpetrated many times, the variance was over 6 times larger than the mean. Therefore, the negative binomial model was clearly more appropriate than the Poisson.

All three variations of the Poisson regression model are available in many general statistical packages, including SAS, Stata, and S-Plus.

References:

- Gardner, W., Mulvey, E.P., and Shaw, E.C (1995). “Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models”, Psychological Bulletin, 118, 392-404.
- Long, J.S. (1997). Regression Models for Categorical and Limited Dependent Variables, Chapter 8. Thousand Oaks, CA: Sage Publications.

Sharon says

Dear Karen,

Such a great article – thank you!

I was wondering – are there any “classic” variables that can almost always be used in (and provide good fit to) linear regression? That is, variables that are not count data?

Thank you very much 🙂

Karen Grace-Martin says

That’s a good question Sharon. There are definitely variables that tend to follow normal distributions, like human height. But in any given data set, that might not be true. For example, it might not hold if your population of interest isn’t all humans, but only infants.

Jordan says

This was a very informative article. I am glad you mentioned something that has been bothering me: that Negative Binomial “models the natural log of the response variable, ln(Y), as a linear function of the coefficients.”

I am working with a dataset which seems well suited for a count model. My Y is a discrete interger (0,1,2, ..). The conditional distributions are skewed with variance much larger than mean. However, when I run linear models on subsamples (broken down by the E[Y| Other Covariates]) I find that the effect of a one unit increase in X is fairly constant across the subsamples. This suggest the effect of X is linear. Are there any models designed for count data which allow the effect of a one unit increase in X to be linear instead of multiplicative?

Thanks again.

Jason says

Thanks for this helpful website! I have a question. The binomial and Poisson distributions both seem to assume that the individual events they are modeling are independent. What about cases where they don’t seem independent (e.g. if one event occurs, another event is more likely to occur)? Is it okay to still use one of these models? If not, what is a more appropriate count model?

Mirketa says

Thanks for sharing this valuable information

Agnes says

Hi Karen, thank you so much for your helpful article. I really appreciate that.

I’m a student at Universitas Indonesia in Statistics major. I have a question related to my final project. If we have Poisson Regression models, is it true that the mean of error of the models could never be 0?

Many thanks,

Agnes.

Karen Grace-Martin says

Hi Agnes,

I’m not sure I understand exactly what you’re asking. As a generalized linear model, Poisson Regression “errors” are a little different than in linear models. Are you talking about something like Deviance Residuals?

Omer Abid says

This is perhaps the most clear explanation of why count data uses Poisson than anything else I read on the web. Thank you Karen.

obu, obu Enang says

Thank you for the article, my is a question, (1) what are the possible method of modeling count data on Sunil distribution,(2) How can i use R program to run count data

Mushi Solomon says

Thanks Karen

I am the students at University of Dar es Salaam, taking MA Economics.

I really appreciated your article. It adds knowledge.

Thank you

Mushi Solomon

Davies says

Thanks for this short and highly educative piece about our closest everyday life distribution- the Poisson.I am currently doing a PhD research on Bayesian Spatial modeling and my response variable is in counts. Extending the model to accomodate for spatial random effects in the presence of overdispersion is asssumed i can use the negative binomial to model for the count data. For the spatial random effects, my question now is this; can i assume a multivariate student t distribution against the widely assumed gaussian? Pls I need ur expertise.

Davies, South Africa.

Karen says

Great question, Anna.

You will probably get very similar parameter estimates whether you run it as a normal or Poisson model. As the mean gets further away from zero even as low as 10, the Poisson distribution looks more and more like a normal distribution. It becomes symmetric with it a mode at the mean.

However, the normal distribution really is assuming that details extend forever. Therefore, it can give you predicted values that are negative. The Poisson distribution won’t do that, because of the log link. So you only get positive predicted values.

so it depends on whether you’re just interested in the regression coefficients, which would be slightly easier to interpret using a normal model, or the predicted values, which will be more accurate using a Poisson model.

Karen

Anna says

Such a good article! It answered most of my questions on modeling count data.

Just one more: If the distribution of the count data is not skewed, but following a normal-like distribution, could I still use Poisson regression. If so, which one is better? Poisson or OLS?

Thank you very much.

Have a good one,

Anna

COLIN ATKINSON says

THIS IS AN EXCELLENT ARTICLE. IT IS SO EASY TO FOLLOW AND THEREFORE TO REMEMBER.

MANY THANKS,

COLIN ATKINSON

Jay says

Just wanted to say that this article is going to be a life saver for me.

Thanks so much for reminding me about Poisson. It’d been years since my schooling, and without application, all but the most basic elements of my stat teachings had abandoned me. This article gave me the info I needed to get me asking the right questions that will get me to my answers.

Thanks!