Summary table with mean and SD columns of a continuous variable across a few caregorical variables

Nile · November 23, 2024, 12:59am

Hi,
I am using the NHANES package for the data, and CreateTableOne for the function to make a table such as this:

     
    Variable 	Mean 	SD	p-value
Sex			
Male	27.7	4.5	0.013*
    Female	31.3	3.7	
Race			
White	34.2	4.7	0.03 **
    African	33.7	8.3	
South Asian	38.2	4.4	
Hispanic	29.7	.8	

Notes: *= t-test; **=ANOVA

The method described in the help file is not for this kind of table, as it gives n and % only, and not mean and sd. I am wondering if there is any way to create a table like the above.

Here is what I tried so far:

> myVars <- c("HHIncomeMid","SleepTrouble", "HomeOwn", "Education")
> CreateTableOne(vars = myVars, data=NHANES  )
                         
                          Overall            
  n                          10000           
  HHIncomeMid (mean (SD)) 57206.17 (33020.28)
  SleepTrouble = Yes (%)      1973 (25.4)    
  HomeOwn (%)                                
     Own                      6425 (64.7)    
     Rent                     3287 (33.1)    
     Other                     225 ( 2.3)    
  Education (%)                              
     8th Grade                 451 ( 6.2)    
     9 - 11th Grade            888 (12.3)    
     High School              1517 (21.0)    
     Some College             2267 (31.4)    
     College Grad             2098 (29.1)

Please let me know if you have any idea.
Thanks in advance.

This is the format I am looking for:

jrkrideau · November 23, 2024, 2:24pm

My quick reading of the {tableone} documentation suggests that it is fairly rigidly designed for a "Table 1" in a medical study report. I think you probably need to look at one of the more general packages that supply more flexibility. Such a table as you want should be fairly easily done in one of them, You might want to have a look at {gt}, {flextable}, { kableExtra}, {tinytable} and I imagine there are others.

Can you supply us with the final dataset that you are using? A handy way to supply data is to use the dput() function. Do dput(mydata) where "mydata" is the name of your dataset. For really large datasets probably dput(head(mydata, 100)) will do. Paste the output between
```

```

StatSteph · November 24, 2024, 6:12pm

I recommend looking into the package gtsummary. The example below gets you almost there. I couldn't quite figure out how to do the ANOVA test but perhaps someone else can. Note the use of a reproducible example. I did this using the package reprex.

library(NHANES)
library(tableone)
data("NHANES")

myVars <- c("HHIncomeMid","SleepTrouble", "HomeOwn", "Education")
CreateTableOne(vars = myVars, data=NHANES  )
#>                          
#>                           Overall            
#>   n                          10000           
#>   HHIncomeMid (mean (SD)) 57206.17 (33020.28)
#>   SleepTrouble = Yes (%)      1973 (25.4)    
#>   HomeOwn (%)                                
#>      Own                      6425 (64.7)    
#>      Rent                     3287 (33.1)    
#>      Other                     225 ( 2.3)    
#>   Education (%)                              
#>      8th Grade                 451 ( 6.2)    
#>      9 - 11th Grade            888 (12.3)    
#>      High School              1517 (21.0)    
#>      Some College             2267 (31.4)    
#>      College Grad             2098 (29.1)

library(gtsummary)
#> Warning: package 'gtsummary' was built under R version 4.4.1
#> #BlackLivesMatter

NHANES %>%
  tbl_continuous(
    variable = BMI,
    include=c(Gender, Race1),
    statistic = everything() ~ "{mean} ({sd})",
    digits=everything()~1
  ) %>%
  add_p(
    list(Gender~"t.test"),
    test.args = all_tests("t.test") ~ list(var.equal = TRUE)
  )

Characteristic	N = 10,000¹	p-value²
Gender		0.13
female	26.8 (7.9)
male	26.5 (6.8)
Race1		<0.001
Black	28.1 (9.1)
Hispanic	26.4 (7.0)
Mexican	26.5 (7.0)
White	26.7 (7.1)
Other	24.4 (6.6)
¹ BMI: Mean (SD)
² Two Sample t-test; Kruskal-Wallis rank sum test

^{Created on 2024-11-24 with reprex v2.1.0}

StatSteph · November 24, 2024, 6:55pm

Updated with ANOVA

library(gtsummary)
library(NHANES)
data("NHANES")

NHANES %>%
  tbl_continuous(
    variable = BMI,
    include=c(Gender, Race3),
    statistic = everything() ~ "{mean} ({sd})",
    digits=everything()~1
  ) %>%
  add_p(
    list(Gender~"t.test", Race3~"oneway.test"),
    test.args = everything() ~ list(var.equal = TRUE)
  )

Characteristic	N = 10,000¹	p-value²
Gender		0.13
female	26.8 (7.9)
male	26.5 (6.8)
Race3		<0.001
Asian	23.7 (5.7)
Black	27.6 (8.7)
Hispanic	26.0 (6.8)
Mexican	26.1 (7.1)
White	26.6 (7.1)
Other	26.1 (8.1)
¹ BMI: Mean (SD)
² Two Sample t-test; One-way analysis of means

^{Created on 2024-11-24 with reprex v2.1.0}

Nile · November 28, 2024, 2:34am

This is brilliant! Thank you so much for the neat solution!

I have a follow-up question to this, does this method auto-decide the type of test e.g. t-test or anova, depending the number of categories in the row variables?

Nile · November 28, 2024, 3:25am

Here are the steps I followed to get this data:

dput(NHANES %>% dplyr::select(HHIncomeMid, SleepTrouble, HomeOwn, Education))

system · February 26, 2025, 3:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.