I have some doubts about the p-value. I made this reprex to show my doubts.
I have a 100x3 tibble. Here I leave a head of the table.
# A tibble: 6 × 3
A B C
<fct> <fct> <fct>
1 NA car TRUE
2 NA bike FALSE
3 Red bike TRUE
4 NA bike FALSE
5 Blue bike FALSE
6 Blue car TRUE
The values are:
$A
.
Blue Red <NA>
36 29 35
$B
.
bike car <NA>
34 31 35
$C
.
FALSE TRUE <NA>
58 42 0
I need to know how to get the p value excluding the NAs, the problem is that I don't know how the packages calculate it. I use 2 packages (tableone) and gt_summary() and none specify.
Also, the 2 packages give me different things and the strange thing is that if I exclude the NA they also continue to give me 2 different things.
Like this:
not excluding NA
With tableone:
base %>%
CreateTableOne(vars= c("A",
"B"),
strata= "C",
includeNA = F,
addOverall = T) %>%
print(showAllLevels = T,
explain= F)
level Overall FALSE TRUE p test
n 100 58 42
A Blue 36 (55.4) 22 (57.9) 14 (51.9) 0.818
Red 29 (44.6) 16 (42.1) 13 (48.1)
B bike 34 (52.3) 17 (48.6) 17 (56.7) 0.687
car 31 (47.7) 18 (51.4) 13 (43.3)
With gt_summary:
base %>%
tbl_summary(missing_text = "(Missing)",
missing= "always",
by= "C") %>%
add_p() %>%
add_overall() %>%
as.tibble()
# A tibble: 8 × 5
level `**Overall**, N = 100` `**FALSE**, N = 58` `**TRUE**, N = 42` `**p-value**`
<chr> <chr> <chr> <chr> <chr>
1 A NA NA NA 0.6
2 Blue 36 (55%) 22 (58%) 14 (52%) NA
3 Red 29 (45%) 16 (42%) 13 (48%) NA
4 (Missing) 35 20 15 NA
5 B NA NA NA 0.5
6 bike 34 (52%) 17 (49%) 17 (57%) NA
7 car 31 (48%) 18 (51%) 13 (43%) NA
8 (Missing) 35 23 12 NA
and now excluding the NA:
With tableone:
base %>%
filter(!is.na(A) &
!is.na(B)) %>%
CreateTableOne(vars= c("A",
"B"),
strata= "C",
addOverall = T) %>%
print(showAllLevels = T,
explain= F)
level Overall FALSE TRUE p test
n 38 20 18
A Blue 22 (57.9) 12 (60.0) 10 (55.6) 1.000
Red 16 (42.1) 8 (40.0) 8 (44.4)
B bike 16 (42.1) 6 (30.0) 10 (55.6) 0.206
car 22 (57.9) 14 (70.0) 8 (44.4)
With gt_summary:
base %>%
filter(!is.na(A) &
!is.na(B)) %>%
tbl_summary(missing_text = "(Missing)",
missing= "always",
by= "C") %>%
add_p() %>%
add_overall() %>%
as.tibble()
# A tibble: 8 × 5
level `**Overall**, N = 38` `**FALSE**, N = 20` `**TRUE**, N = 18` `**p-value**`
<chr> <chr> <chr> <chr> <chr>
1 A NA NA NA 0.8
2 Blue 22 (58%) 12 (60%) 10 (56%) NA
3 Red 16 (42%) 8 (40%) 8 (44%) NA
4 (Missing) 0 0 0 NA
5 B NA NA NA 0.11
6 bike 16 (42%) 6 (30%) 10 (56%) NA
7 car 22 (58%) 14 (70%) 8 (44%) NA
8 (Missing) 0 0 0 NA
Why does it give me two different values of p value? Why if I exclude the NA it still gives me two different values? Someone could help me?