Wilcoxon Signed Rank with NAs using ggpaired()

Hello everyone,

I am comparing concentration measurements of biomarkers including the ratio between concentrations at two timepoint. As they are paired and not normally distributed, I am using Wilcoxon Signed Rank. Since ratios of course cannot be measured when the divider is 0, I have an NA (see example below)

|Sample | Conc1 | Conc2 | Conc1/conc2|
|1 | 21 | 1543 | 0.0137|
|2 | 621 | 0 | NA|

The string is here - although edited for simplicity:

ggpaired(dataset, x ="Timepoint", y = "conc1/conc2",
fill = "Timepoint", line.color = "gray",
xlab = "Analysis timepoint", ylab = "conc1/conc2") +
stat_compare_means(paired = TRUE)

1: Removed 1 rows containing non-finite values (stat_boxplot()).
2: Removed 1 rows containing non-finite values (stat_compare_means()).
3: Computation failed in stat_compare_means()
Caused by error in wilcox.test.default():
! 'x' and 'y' must have the same length

Do anyone have a suggestion as to what I might do?

Kind regards

Hi Cecilie,

Would you be able to run this code:

dput(head(dataset, 100))

and then paste the output here, between a pair of triple backticks, like this?

```
paste here
```

That would help folks recreate the issue you're running into.

structure(list(Sample = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L), Timepoint = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), HOXA9_conc = c(50L, 
31L, 40L, 258L, 22L, 22L, 1679L, 414L, 227L, 211L, 216L, 46L, 
14L, 70L, 235L, 2086L, 141L, 211L, 74L, 106L, 8915L, 31L, 3191L, 
41L, 8070L, 252L, 29L, 26L, 1L, 2L, 9L, 3L, 230L, 2712L, 50L, 
4L, 34L, 946L, 56L, 5L, 60L, 15L, 82L, 234L, 14L, 13L, 1492L, 
299L, 123L, 205L, 179L, 28L, 6L, 45L, 136L, 1048L, 61L, 163L, 
34L, 151L, 5956L, 19L, 2162L, 54L, 10934L, 302L, 26L, 21L, 621L, 
2L, 10L, 2L, 259L, 6521L, 76L, 4L, 52L, 818L, 57L, 3L), ALB_conc = c(580L, 
508L, 2017L, 1330L, 630L, 496L, 7233L, 5005L, 1530L, 3159L, 1794L, 
1144L, 532L, 1383L, 1132L, 10291L, 1531L, 2800L, 1123L, 2493L, 
19675L, 727L, 12887L, 2148L, 27182L, 2426L, 688L, 1031L, 368L, 
942L, 937L, 447L, 3159L, 47717L, 518L, 2236L, 505L, 1034L, 5067L, 
913L, 1016L, 463L, 2724L, 2085L, 973L, 1044L, 7736L, 5236L, 2185L, 
6811L, 2864L, 1904L, 577L, 2111L, 1433L, 11286L, 1661L, 4958L, 
1472L, 4570L, 22211L, 1269L, 14768L, 2576L, 31846L, 3398L, 998L, 
1543L, 0L, 1473L, 1598L, 727L, 4726L, 55511L, 792L, 3712L, 1153L, 
1592L, 6387L, 1366L), Ratio = c(0.085, 0.061, 0.0198, 0.194, 
0.035, 0.043, 0.232, 0.083, 0.148, 0.067, 0.12, 0.04, 0.026, 
0.051, 0.207, 0.203, 0.092, 0.075, 0.066, 0.043, 0.453, 0.044, 
0.247, 0.019, 0.297, 0.104, 0.042, 0.025, 0.0025, 0.0021, 0.0097, 
0.006, 0.073, 0.0568, 0.096, 0.0017, 0.069, 0.92, 0.0111, 0.0058, 
0.0591, 0.0323, 0.0301, 0.112, 0.014, 0.0121, 0.193, 0.0571, 
0.0562, 0.0301, 0.0624, 0.0149, 0.00977, 0.0213, 0.0952, 0.0927, 
0.0368, 0.0328, 0.0231, 0.033, 0.268, 0.0146, 0.147, 0.021, 0.343, 
0.0888, 0.0257, 0.0137, NA, 0.00113, 0.00619, 0.00221, 0.0548, 
0.117, 0.0964, 0.000953, 0.0453, 0.513, 0.00897, 0.00208)), row.names = c(NA, 
80L), class = "data.frame")

Thanks, Cecilie: It looks like the function ggpaired() does not do any correcting before passing the data onto to the wilcox.test() function, so the fact that sample 29 has no ratio defined unbalances the pariring. If you remove sample 29 before running ggpaired(), the test is performed.

P.S.: I'm curious —why did you choose to use the ratio as the variable to compare?

How symmetric in shape is your distribution of ratios? I'm suspecting not very much. If that is the case, the signed rank test is the wrong test, because that assumes symmetry of the distribution of (in your case) ratios. You have some options:

  • use a sign test on the ratios (eg, test that the median ratio is 1), which assumes nothing about shape
  • try a transformation of the concentrations. For example, working with the logs of the concentrations would mean that the log of the ratio is the difference of the log-concentrations, which is more usual for a matched-pairs test. If the distribution of log-ratios is close to normal, which it might be, the t-test will be fine.

IMO: because of the assumption of symmetry, the signed rank test is almost never the right test. If symmetry is reasonable, then a t-test will often be fine; if it is not, you need the sign test, or to try for a transformation.