Help: Problems with 'statistic' argument of 'gtsummary::tbl_summary()'

fefortti · April 24, 2020, 2:07pm

Hi!
I have a database with variables that characterize two groups (a control group and a treatment group). With the exception of the variable that distinguishes these two groups, all other variables are of the numeric type.
Considering this scenario, when I try to use the "tbl_summary ()" function of the "gtsummary" package, I am facing problems that, apparently, are related to the recognition of the variable type of some of the numerical variables. The basic statistics that, by default of the "gtsummary" package, should be shown in the summary table are "median (Q1, Q3)", since all variables are numeric. However, even when I force these metrics into the "statistic" argument of the "tbl_summary" function (an argument that defines the basic statistics to be presented for each variable) this does not happen. For part of the numerical variables the statistics shown in the table are the statistics that, by default of the "gtsummary" package, would be shown for categorical variables, that is, "n (%)".
This same problem occurs for the "test" argument of the "add_p ()" function of the same "gtsummary" package (argument that defines the test to be performed for each bank variable). Instead of running the declared test for all numeric variables ("wilcox.test"), for part of the numerical variables another test ends up being executed, a specific test for categorical variables ("fisher.test").
Among the countless tests that I performed to try to understand the problem, I was able to solve the problem only for the "test" argument. Since this argument accepts the use of "everything ()", by replacing "all_continuous () ~ 'wilcox.test'" with "everything () ~ 'wilcox.test'", the wilcoxon test was correctly applied to all numeric variables. However, since the "statistic" argument does not accept "everything ()" and apparently only accepts "all_continuous ()" or "all_cathegoric ()", this did not solve the problem of the basic statistics that are presented in the summary table (median (Q1, Q3) vs. n (%)).

data %>%
  gtsummary::tbl_summary(data = ., 
                         by = group) %>%
  gtsummary::add_p(x = ., 
                   test = list(everything() ~ "wilcox.test"))

OR

data %>%
  gtsummary::tbl_summary(data = ., 
                         by = group,
                         statistic = 
                           list(all_continuous() ~ "{median} ({p25}, {p75})")) %>%
  gtsummary::add_p(x = ., 
                   test = list(everything() ~ "wilcox.test"))

statistishdan · April 24, 2020, 2:58pm

Hello @fefortti!

The tbl_summary() function makes it's best guess whether each column in your data frame is continuous, categorical, or dichotomous. The default statistics displayed are based on this initial guess. If the function defaulted to an incorrect type, you can use the type= argument to change it. I think in your case, tbl_summary() guessed type categorical, when you were expecting continuous.
For more details on how the type is determined, see the help file Create a table of summary statistics — tbl_summary • gtsummary

From your code example, it looks to me like you want all the columns summarized as a continuous variable (using median and IQR), and to compare the groups with the Wilcoxon Rank Sum test. The first step I would take is to assign all types to continuous, and then everything else should work itself out with the defaults. (The default test for continuous variables is Wilcoxon.)

If this suggestion does not solve your problem, please provide an example I can run on my machine (aka a reprex), and I can perhaps help further.

Happy Coding!

library(gtsummary)

trial %>%
  # selecting the continuous variables only
  dplyr::select(age, marker, trt) %>%
  # summarizing data by treatment type, all variables are continuous
  tbl_summary(by = trt, type = everything() ~ "continuous") %>%
  # comparing treatment groups
  add_p()

fefortti · April 24, 2020, 3:16pm

Thank you very much, @statistishdan !!!
Your suggestion did solve my problem!
I really thought it was very strange not to have this possibility inside the function and, after reading the documentation so many times, I don't know how I didn't see and didn't test this argument. I apologize for that and, again, thank you very much!

system · May 1, 2020, 3:19pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.