Hi!
for simplicity, let's say I have a dataframe called df, with two variables - weight_0 and weight_1.
weight_0 describes the weight of participants before intervention, and weight_1 describes the weight of the same participants after intervention.
Now, I want to know if the entire cohort demonstrated weight loss during the trial, so I would use the paired Wilcoxon test to compute P-value. I used to be a Python user, so I did it with the pingouin package
import pingouin as pg
pg.wilcoxon(x=df["weight_0"], y=df["weight_1"])
The output would be
|W-val|alternative|p-val|RBC|CLES|
|Wilcoxon|11459.5|two-sided|0.004|0.214|0.535|
The same would happen if I would use the famous scipy
import scipy.stats as stats
stats.wilcoxon(x=df["weight_0"], y=df["weight_1"])
WilcoxonResult(statistic=11459.5, pvalue=0.003966344096710916)
I started using R on the same dataset, and to my suprise I got a different P value
wilcox.test(x = df$weight_0, y = df$weight_1, paired = T)
Wilcoxon signed rank test with continuity correctiondata: df$weight_0 and df$weight_1
V = 13112, p-value = 0.02105
alternative hypothesis: true location shift is not equal to 0
However, If I calculte a variable which is the difference in weight,
weight_del = weight_1 - weight_0
I get a result which is identical to Python:
wilcox.test(df$weight_del, mu = 0)
Wilcoxon signed rank test with continuity correctiondata: df$weight_del
V = 11460, p-value = 0.003972
alternative hypothesis: true location is not equal to 0
I have seen nomerous posts online regarding different results between R and Python while performing the Wilcoxon test. However, I don't understand why I get identical results if I'm using the 'del' variable and what is happening under the hood.