# How to test whether zero is the real zero of the dataset?

Hey everyone,

I work with a dataset that contains information on cells (%) after stimulation over a period of time. In order to asses the effect of the stimulation, I've subtracted the negative control (non-stimulated cells) from the stimulated cells. This often resulted in negative outcomes, which have been set to zero. Based on the reasoning that if there are lower/equal cells produced after stimulation compared to the negative control, there is no reaction en thus 0 stimulation going on.

Now I would like to know whether these assigned zero's are 'correct' and thus statistically different to the lower values in my data set and whether or not all counts below 1 (or another value) rather than below 0 should be set to Zero.

How can I tackle this in R?
Would the Score Tests for Zero-Inflation of Van den Broek, Jan. 1995. be appropriate?

Snapshot of the df:

• [73] NA NA NA NA NA NA NA 0.0000 1.1500*
• [82] 0.0000 NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000*
• [91] 0.0000 0.0000 0.0000 0.0000 0.0000 NA 10.8000 0.0000 0.7350*
• [100] NA 4.3550 NA NA NA NA NA 5.1950 3.4750*

I am having trouble visualising the data. can you supply some sample data and any relevant code?

A part of the df without NA:
## Subject True_value Response.(CD3_blast)
##1 1 -1.76 0
##2 1 -2.16 0
##3 1 0.9750 0.9750
##4 1 -0.6 0
##5 1 2.0350 2.0350
##6 1 0.1400 0.1400
##7 1 -3.26 0
##8 1 3.9350 3.9350
##9 1 0.0300 0.0300
##10 1 -20.7 0

Histogram of the data:

As I'm setting the zero values myself: 'IF True_Value (Stimulation response - Negative control) < 0 => assign 0'. I'm wondering whether the smaller values below e.g. 1 are responses or should also be zero and thus whether < 0 should be changed to < 1 or <0.5 or <0.1. However, I don't know how I could test this in R.

Code Score Tests for Zero-Inflation:

# JVDB score test *
numerator <- (n0 -np0_tilde)^2

denominator <- np0_tilde*(1-p0_tilde) - nlambda_est(p0_tilde^2)*

test_stat <- numerator/denominator

pvalue <- pchisq(test_stat,df=1, ncp=0, lower.tail=FALSE)

However, I guess this test just determines whether my data is zero-inflated and does not give any information on whether the low values should be zero's as well.

Okay, I think I'm in over my head here nor do I have access to the Van den Broek, (1995) article.

I think you're correct that treating the negative numbers and the ' true' zeros as the same is a mistake. Intuitively it just feels wrong to set those negative values to zero. It feels like you're losing information. But I think that's subject matter issue not a programming or even a statistical issue. Since I don't know the subject area I don't even understand how you can get negative numbers.

I would think you need to discuss the issue with colleagues that understand your research area and then maybe consult a statistician.

Sorry not to be able to supply more useful help.

No problem. I've already had some discussions regarding this topic with several colleagues, but there was never a consensus on the matter let alone a solution to the problem. So, I though turning to a bigger audience might help.
Either way, thank you for looking at it.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.