I work with a dataset that contains information on cells (%) after stimulation over a period of time. In order to asses the effect of the stimulation, I've subtracted the negative control (non-stimulated cells) from the stimulated cells. This often resulted in negative outcomes, which have been set to zero. Based on the reasoning that if there are lower/equal cells produced after stimulation compared to the negative control, there is no reaction en thus 0 stimulation going on.
Now I would like to know whether these assigned zero's are 'correct' and thus statistically different to the lower values in my data set and whether or not all counts below 1 (or another value) rather than below 0 should be set to Zero.
How can I tackle this in R?
Would the Score Tests for Zero-Inflation of Van den Broek, Jan. 1995. be appropriate?
Snapshot of the df:
[73] NA NA NA NA NA NA NA 0.0000 1.1500*
[82] 0.0000 NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000*
[91] 0.0000 0.0000 0.0000 0.0000 0.0000 NA 10.8000 0.0000 0.7350*
As I'm setting the zero values myself: 'IF True_Value (Stimulation response - Negative control) < 0 => assign 0'. I'm wondering whether the smaller values below e.g. 1 are responses or should also be zero and thus whether < 0 should be changed to < 1 or <0.5 or <0.1. However, I don't know how I could test this in R.
However, I guess this test just determines whether my data is zero-inflated and does not give any information on whether the low values should be zero's as well.
Okay, I think I'm in over my head here nor do I have access to the Van den Broek, (1995) article.
I think you're correct that treating the negative numbers and the ' true' zeros as the same is a mistake. Intuitively it just feels wrong to set those negative values to zero. It feels like you're losing information. But I think that's subject matter issue not a programming or even a statistical issue. Since I don't know the subject area I don't even understand how you can get negative numbers.
I would think you need to discuss the issue with colleagues that understand your research area and then maybe consult a statistician.
No problem. I've already had some discussions regarding this topic with several colleagues, but there was never a consensus on the matter let alone a solution to the problem. So, I though turning to a bigger audience might help.
Either way, thank you for looking at it.