How to test whether zero is the real zero of the dataset?

Marjory · October 21, 2020, 2:01pm

Hey everyone,

I work with a dataset that contains information on cells (%) after stimulation over a period of time. In order to asses the effect of the stimulation, I've subtracted the negative control (non-stimulated cells) from the stimulated cells. This often resulted in negative outcomes, which have been set to zero. Based on the reasoning that if there are lower/equal cells produced after stimulation compared to the negative control, there is no reaction en thus 0 stimulation going on.

Now I would like to know whether these assigned zero's are 'correct' and thus statistically different to the lower values in my data set and whether or not all counts below 1 (or another value) rather than below 0 should be set to Zero.

How can I tackle this in R?
Would the Score Tests for Zero-Inflation of Van den Broek, Jan. 1995. be appropriate?

Snapshot of the df:

[73] NA NA NA NA NA NA NA 0.0000 1.1500*
[82] 0.0000 NA 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000*
[91] 0.0000 0.0000 0.0000 0.0000 0.0000 NA 10.8000 0.0000 0.7350*
[100] NA 4.3550 NA NA NA NA NA 5.1950 3.4750*

Thank you for your help!

jrkrideau · October 22, 2020, 12:00am

I am having trouble visualising the data. can you supply some sample data and any relevant code?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Marjory · October 22, 2020, 8:21am

A part of the df without NA:
## Subject True_value Response.(CD3_blast)
##1 1 -1.76 0
##2 1 -2.16 0
##3 1 0.9750 0.9750
##4 1 -0.6 0
##5 1 2.0350 2.0350
##6 1 0.1400 0.1400
##7 1 -3.26 0
##8 1 3.9350 3.9350
##9 1 0.0300 0.0300
##10 1 -20.7 0

Histogram of the data:

As I'm setting the zero values myself: 'IF True_Value (Stimulation response - Negative control) < 0 => assign 0'. I'm wondering whether the smaller values below e.g. 1 are responses or should also be zero and thus whether < 0 should be changed to < 1 or <0.5 or <0.1. However, I don't know how I could test this in R.

Code Score Tests for Zero-Inflation:

# JVDB score test *
numerator <- (n0 -np0_tilde)^2
denominator <- np0_tilde*(1-p0_tilde) - nlambda_est(p0_tilde^2)*

test_stat <- numerator/denominator

pvalue <- pchisq(test_stat,df=1, ncp=0, lower.tail=FALSE)

However, I guess this test just determines whether my data is zero-inflated and does not give any information on whether the low values should be zero's as well.

jrkrideau · October 22, 2020, 4:44pm

Okay, I think I'm in over my head here nor do I have access to the Van den Broek, (1995) article.

I think you're correct that treating the negative numbers and the ' true' zeros as the same is a mistake. Intuitively it just feels wrong to set those negative values to zero. It feels like you're losing information. But I think that's subject matter issue not a programming or even a statistical issue. Since I don't know the subject area I don't even understand how you can get negative numbers.

I would think you need to discuss the issue with colleagues that understand your research area and then maybe consult a statistician.

Sorry not to be able to supply more useful help.

Marjory · October 23, 2020, 7:49am

No problem. I've already had some discussions regarding this topic with several colleagues, but there was never a consensus on the matter let alone a solution to the problem. So, I though turning to a bigger audience might help.
Either way, thank you for looking at it.

system · November 13, 2020, 7:50am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.