How to perform a goodness of fit test to check of data X likely come from a distribution using chisq.test(x, p)?

Say I have a data vector X and I want to test if they come from a distribution say Poisson. Obviously, I can't specify the Lambda of Poisson, so I estimate it using X, which loses 1df. If I were to test against normality, then I lose 2df from mu = mean(X), sigma = sd(X).

I tried using chisq.test() multiple times and it seems that it always uses df = k-1, where k is number of bins/groups of data, so implying (correct me if I am wrong) that the function assumes the p comes from a fully-specified distribution, which in practice is highly unlikely. It would be ok if I could input the df for chisquare, but the chisquare is programmed to use k-1 df.

My question is how do I do the test properly in R (using built-in functions)?

say I have some data that I want to test against Poisson
data = c(0, 0, 0, 1, 0, 1, 2, 2); 
lambda = mean(data); #0.75
bins = c(0, 1, 2, 3); #bins for grouping data
x = c(4, 2, 2, 0); #number of observations for bins
p = dpois(bins, lambda);
chisq.test(x, p=p, rescale.p=TRUE); #rescale.p=TRUE because p 
doesn't sum to 1
#the df should be number of bins - 1 - number of estimates, so 2, but 
R always uses df = k-1?

If you type chisq.test into your console without params, without even brackets, you will see the source code of the function. You can copy and paste this to an R script. The df value is referred to as PARAMETER through most of the code
You will see that in the common case, when x is not a matrix but a vector, etc, that df is length of input vector -1 , and that this is recorded (in order to be reported out to the screen) and then further actually used in a call out to function pchisq:

      PARAMETER <- length(x) - 1
      PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)

So you might consider whether you want to calculate with pchisq yourself directly, or else use the chisq.test function and modify it to your own needs.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.