How to get a specific p-value (FDR) in wilcoxon test

you_y · December 14, 2022, 11:32am

Hi!
I have question in statistical problem. I saw a recent papers that two groups of microbiota data were compared using wilcoxon test, with the FDR adjusted P value.
However, same results was obtained using my microbiota data set (wilcoxon p value and FDR adjusted p value were same).

Is there have a problem in this code? ex, Wilcoxson = wilcox.test(A1$value~A1$TRT, p.adjust.methods = "BH")

Please check this code and give some information.

Since my data is really large, i made simple example set.

~TRT,         ~A1,         ~A2,         ~A3,         ~A4,         ~A5,         ~A6,         ~A7,         ~A8,         ~A9,        ~A10,        ~A11,        ~A12,        ~A13,        ~A14,        ~A15,        ~A16,
  "CON", 0.640392842, 12.59270509, 0.475865037, 2.417293138,  19.4294682, 2.665350444, 0.642924039, 1.926240919, 2.007239223, 0.610018478, 2.343888425, 0.966917255,  0.54926975,  6.31786772, 0.144278229, 3.060217177,
  "CON", 3.709962363, 5.712164273, 0.130578385, 0.094733338, 48.82863507, 0.350769388, 0.640090125, 0.074250454, 0.450623448, 0.312363981, 1.075351409, 1.671915406,  0.40965768, 5.125841719, 0.366131551, 3.978800215,
  "CON", 0.133889099, 11.71150688, 0.732600733,  2.23064292, 29.08424908, 4.403183024, 0.239989895, 2.346848554, 1.695086523, 0.560818492,  1.40204623, 2.288745737, 0.553239864, 4.390551977, 0.159151194, 5.497031704,
  "CON", 0.202547029, 10.54510469, 1.116540497, 1.311492012,   29.731372, 0.673468871, 0.493708383, 0.962098387, 0.731701142, 0.377243841,   2.0128111, 0.749424007, 1.382383472, 6.435931843,  0.24812011, 3.152138137,
  "CON", 0.298749304, 17.18821206, 0.665856499, 1.551977315, 22.66443871, 0.936756291, 0.349384779, 0.488632336, 1.179806572, 0.387361385, 1.898830321, 1.936806927, 1.308927034, 8.286495519,  0.69370601, 1.402602663,
  "CON", 0.483703497, 26.80882316, 0.235520551, 0.481171018, 25.28427077, 0.812925773,   0.6888343, 0.458378707, 0.817990731, 0.298832527, 1.063641198, 1.765137893,  0.82052321, 4.176057943,  0.36974194, 1.909489199,
  "CON", 0.435608459, 10.34063568, 0.615423579,  0.90667342, 39.51880461, 0.316575915,  0.55717361, 0.222869444, 0.630619222,  0.42547803,  1.39799924, 1.291629733, 0.666075725, 5.956692415, 0.303912878, 3.768519691,
  "CON", 0.807043287, 9.226604599, 0.619829484, 1.378804362, 30.52343967, 1.358565031, 1.740582387, 0.339008779, 0.968957927,  0.25046171, 1.148581982, 0.976547676,  0.76403471, 3.018190098, 0.303589951, 1.965744934,
  "CON", 0.182750393, 12.74683994, 0.467028783, 0.182750393, 40.71780293, 1.050814762, 0.540636581, 0.439108584, 0.619320778, 0.294431189, 2.378293314, 1.647291741, 0.799532971, 4.695669831, 0.743692573, 2.294532717,
  "HIT", 0.167181721, 12.31572015, 0.212776736, 0.764983029, 22.25036729, 2.548254724,  3.16125437, 0.438218755, 1.580627185, 0.096256143,  1.92765591, 1.806069203, 0.306499823, 11.33289427,  1.68194944, 3.508283094,
  "HIT", 2.017291066, 11.32008696, 0.546033672, 0.950503059, 34.02093129, 0.950503059, 0.935335457, 0.803882906, 1.718994894, 0.212346428, 1.377723849, 3.003185196, 0.308407907, 3.554274736, 0.197178826, 2.232165428,
  "HIT", 1.665990466, 19.49234202, 0.162288265, 0.111573182, 38.34820976, 4.962470839, 1.853636271,  0.26371843, 0.266254184, 0.055786591, 0.773405011,  2.01846029, 0.083679886, 0.988944112, 0.228217872, 1.379450249,
  "HIT", 0.668168358, 15.64121384, 0.420136165, 2.467667232, 22.01918453, 0.536559439, 0.592240136, 1.698261244, 1.455290932,  1.61727114, 1.703323125, 1.374300828, 0.658044595, 3.171268761, 0.139201741, 2.584090506,
  "HIT", 0.832363508, 11.04589384, 0.242878106, 0.979102363, 25.87663816, 0.371907099, 1.105601376, 0.766584021, 1.300409857, 0.928502758, 4.812022466, 3.544502353, 1.510398219, 4.414815564, 0.485756211, 1.495218337,
  "HIT", 0.433537003, 23.37550389, 0.494384301, 0.311842405, 33.85138047, 0.291559973, 1.521182466, 0.147047638, 1.607382806, 0.091270948, 1.564282636, 3.574778795, 0.608472986, 4.421570367, 0.745379408, 0.900032959,
  "HIT", 2.604259208, 15.48341244, 0.182755032, 0.045688758, 35.76414448, 0.715790542, 4.035840292, 0.043150494, 1.731096276, 0.055841815, 0.560956418, 2.431657233, 0.213214204,  2.59410615, 0.395969236, 3.233748763,
  "HIT", 0.184768028, 13.44503784, 0.164519477, 0.491027361, 24.45265636, 1.445240325,  1.65785011,  0.44799919, 0.726416766, 0.222734061, 1.695816143, 1.832493862, 0.566959427, 3.971247058, 0.531524463, 4.186387912,
  "HIT",  6.96603928, 9.303396072, 0.109963175, 0.074161211, 41.75787643, 0.046031097, 1.171235679, 0.112520458, 0.092062193, 0.112520458, 0.480769231, 0.751841244, 0.360576923,  2.36804419, 1.859144845, 1.380932897

library(reshape)
Genus  <-  read.table("Microbiota.txt",sep="\t",header=TRUE)
#> Warning in file(file, "rt"): cannot open file 'Microbiota.txt': No such file or
#> directory
#> Error in file(file, "rt"): cannot open the connection
Genus
#> Error in eval(expr, envir, enclos): object 'Genus' not found

AM = melt(Genus, id = c("TRT"))
#> Error in melt(Genus, id = c("TRT")): object 'Genus' not found
AM_list = unique(AM$variable)
#> Error in unique(AM$variable): object 'AM' not found

y <- data.frame()
for(i in AM_list){
  A1 = subset(AM, variable==i)
  Wilcoxson = wilcox.test(A1$value~A1$TRT)  #Wilcoxson = wilcox.test(A1$value~A1$TRT, p.adjust.methods = "BH")
  Wilcox_P <- data.frame(i,Wilcoxson$p.value)
  y <- rbind.data.frame(y,Wilcox_P)
}
#> Error in eval(expr, envir, enclos): object 'AM_list' not found
y
#> data frame with 0 columns and 0 rows

write.table(y, "wilcoxon_result.txt", sep="\t")
Significant <- subset(y, y$Wilcoxson.p.value <0.1)
View(Significant)
#> Error in View(Significant): invalid 'x' argument
nrow(Significant)
#> [1] 0
write.table(Significant, "wilcoxon_result.txt", sep="\t")
Created on 2022-12-14 with reprex v2.0.2```

AlexisW · December 15, 2022, 1:40am

Note: to get your code to work, I had to change reshape with reshape2.

Indeed, this code is incorrect. There are two parts to the explanation, the second part is more important.

1 The technical problem

The problem with that code is that the function wilcox.test() doesn't have a p.adjust.methods parameter. You can see the list of valid parameters by typing ?wilcox.test or online.
So, ideally, we'd expect this code to give us an error message, telling us that we're trying to use a parameter that doesn't exist. But that won't happen because wilcox.test() accepts the special parameter ..., so it will accept any input, even this nonsense doesn't give an error message:

> wilcox.test(A1$value~A1$TRT, dcfghjuk = 1)

	Wilcoxon rank sum exact test

data:  A1$value by A1$TRT
W = 53, p-value = 0.2973
alternative hypothesis: true location shift is not equal to 0

But you can check that adding a p.adjust.methods parameter has no effect:

> wilcox.test(A1$value~A1$TRT)

	Wilcoxon rank sum exact test

data:  A1$value by A1$TRT
W = 53, p-value = 0.2973
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(A1$value~A1$TRT, p.adjust.methods = "BH")

	Wilcoxon rank sum exact test

data:  A1$value by A1$TRT
W = 53, p-value = 0.2973
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(A1$value~A1$TRT, p.adjust.methods = "not a real method")

	Wilcoxon rank sum exact test

data:  A1$value by A1$TRT
W = 53, p-value = 0.2973
alternative hypothesis: true location shift is not equal to 0

And indeed, it can't. For that, we need to think about what the FDR is.

2 The statistics

The multiple comparison problem

The FDR and p adjustment is necessary to correct for multiple comparisons. There are good explanations online, I would say the most intuitive is to look at this:

Intuitively, you realize that when doing all these statistical tests, the fact that one of them happens to have a small p-value is probably a fluke. So you can't interpret a test the same way if it was the only test you ran, or if it was one of many.

A slightly more rigorous view is that a p-value p<0.05 has a 5% chance of happening when the null hypothesis is true (when there is no effect). So, suppose there is no effect, the null is true, and you do a test and look at the p-value: you have a 5% chance of getting the wrong idea. Now if you do 20 tests, on average one of them will have p < 0.05. If you do hundreds or thousands of tests, you're pretty sure to get some p < 0.05 even if there was never any effect.

There are more explanations online, for example this or that or those.

Long story short, FDR (False Discovery Rate) is one way to correct for multiple comparisons: for a single test, the p-value makes a "promise" that, if there is no effect, you have 5% chances to get p < 0.05. For a group of tests, the FDR makes a promise that, even if there is no effect, 5% of the tests will get it wrong.

In practice

So why do I say all that? Because when you run wilcox.test(A1$value~A1$TRT), you are running one test. So the function wilcox.test() has no way of knowing how many tests you intend to run, it is impossible for it to have a p.adjust built-in.

In practice, you have to first run all you Wilcoxon tests, one by one (your code looks correct from a quick glance). Then, once you have already all the p-values, you can use the function p.adjust(), which does have a method parameter, to obtain the corrected p-values (or in that case, the FDRs).

system · January 26, 2023, 1:40am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.