Hi!

for simplicity, let's say I have a dataframe called df, with two variables - weight_0 and weight_1.

weight_0 describes the weight of participants before intervention, and weight_1 describes the weight of the same participants after intervention.

Now, I want to know if the entire cohort demonstrated weight loss during the trial, so I would use the paired Wilcoxon test to compute P-value. I used to be a Python user, so I did it with the pingouin package

import pingouin as pg

pg.wilcoxon(x=df["weight_0"], y=df["weight_1"])

The output would be

|W-val|alternative|p-val|RBC|CLES|

|Wilcoxon|11459.5|two-sided|0.004|0.214|0.535|

The same would happen if I would use the famous scipy

import scipy.stats as stats

stats.wilcoxon(x=df["weight_0"], y=df["weight_1"])

WilcoxonResult(statistic=11459.5, pvalue=0.003966344096710916)

I started using R on the same dataset, and to my suprise I got a different P value

wilcox.test(x = df$weight_0, y = df$weight_1, paired = T)

Wilcoxon signed rank test with continuity correctiondata: df$weight_0 and df$weight_1

V = 13112, p-value = 0.02105

alternative hypothesis: true location shift is not equal to 0

However, If I calculte a variable which is the difference in weight,

weight_del = weight_1 - weight_0

I get a result which is identical to Python:

wilcox.test(df$weight_del, mu = 0)

Wilcoxon signed rank test with continuity correctiondata: df$weight_del

V = 11460, p-value = 0.003972

alternative hypothesis: true location is not equal to 0

I have seen nomerous posts online regarding different results between R and Python while performing the Wilcoxon test. However, I don't understand why I get identical results if I'm using the 'del' variable and what is happening under the hood.