There are forms of linear regression, such as Poisson, specifically built for count data. But it is also true that in many instances the "standard" ordinary least squares regression works well.
It's certainly a regression, but one could argue about the "linear" part. Basically, a Poisson regression is estimated by maximum likelihood.
If one knows that the data is generated by a Poisson process, a Poisson regression would be better. But an OLS regression should correctly tell you the marginal effect of independent variables on the mean of the counts. The regression will likely have heteroskedasticity issues, so the standard errors should be corrected.
If the minimum number of customers is at least 100, then the difference between a continuous sample space [100,∞) and the discrete sample space {100,101,102,…} has no perceivable effect on our forecasts. However, if our data contains small counts (0,1,2,…), then we need to use forecasting methods that are more appropriate for a sample space of non-negative integers.
@technocrat could you expand on this a little? Certainly, if the set of possible discrete integers is dense then it isn't much different from a continuous distribution. I don't see the difference though between 100,101,102... and 0, 1, 2...
In a least squares regression all that would happen would be the intercept would be 100 higher in the former than in the latter. I'm probably missing something.
Hyndman's point is that if n is sufficiently large, the departure from the assumptions with respect to a continuous variable underlying regression is not so large as to matter. Following the quoted passage, he describes Croston's method for dealing with count forecasts and cites to Vasiliki Christou & Konstantinos Fokianos (2015) On count time series prediction, Journal of Statistical Computation and Simulation, 85:2, 357-373, DOI: 10.1080/00949655.2013.823612 for their use of the Poisson distribution and the negative binomial distribution.
you need to analyse your data to know what you are working with, and what is reasonable/unreasonable to do with it in further analysis such as model building.
You write that you are confused and post to an extract of an article, but you don't ask a question related to it... So I do wonder how technocrat or anyone else might respond to you.
Thank you.
I am confused about the idea of using linear regression for count data.
In my understanding, Poisson regression can be used. However, I am not sure about using 'classical' linear regression for count data. For example, can a variable such as 'number of children' be used as a response variable in 'classical' linear regression? May it sometimes be used?
Thanks.
People sometimes use a classical linear regression. It usually gives a not terrible approximation. But a regression designed for count data, such as a Poisson or zero-inflated Poisson is generally better.
That's a fair explanation of the problems with count data in the context of ordinary least squares linear regression—the underlying assumptions for validity of the test is hard to satisfy. Because those assumptions too often go unexamined for all types of applications of the test that's not surprising.