Summarizing mean, SD and CI of many variable

Mdavidjolly · February 25, 2022, 2:34pm

I have a data set with 15 numerical variabel (SP_1, SP_2, SP_3, SP_4.....SP_15)
Each variable has 20 observations each.

I would like to create a table with; variable, mean, SD and confidence interval.

dvetsch75 · February 25, 2022, 3:21pm

I think you want dplyr::group_by and dplyr::summarize, along with tidyr::pivot_longer. If you need to adjust the confidence level, you can supply the conf.level argument to t.test (by default it is set to 95%).

library(dplyr)
library(tidyr)

# Creating a dataset like yours with 15 variables and 20 observations
df <- data.frame(
    replicate(
        15,
        runif(20)
    )
)

df %>% 
    pivot_longer(
        cols = everything()
    ) %>% 
    group_by(name) %>% 
    summarize(
        'mean' = mean(value),
        'sd' = sd(value),
        'lower_ci' = t.test(value, mu = mean(value))$conf.int[1],
        'upper_ci' = t.test(value, mu = mean(value))$conf.int[2]
    )

Mdavidjolly · February 26, 2022, 7:32am

I get the error below when I run this

Error in summarize(., mean = mean(value), sd = sd(value), lower_ci = t.test(value, :
argument "by" is missing, with no default

gsapijaszko · February 26, 2022, 8:31am

Please have a look on crosstable package: Introduction to Crosstable

willwheels · February 27, 2022, 12:57am

The posted code works for me. Make sure that you have a new R session running--googling that error seems to imply that the error is because you have other packages loaded that also have a function called summarise.

The gtsummary package might also serve your needs (Tutorial: tbl_summary • gtsummary).

Mdavidjolly · March 1, 2022, 12:44pm

The code returns values for everything while I would want to get mean, sd, se and CI or each variable separately.

e.g
mean SD se CI
Var1 x
Var 2 x
Var 3 x

dvetsch75 · March 1, 2022, 5:49pm

Not sure what you mean by:

each variable separately

You could then filter the name column to get at the variable you want. But the above code does in fact calculate statistics only with respect to the individual variables, not on the whole dataset. You can confirm it by comparing the data.frame and list results below:

library(dplyr)
library(tidyr)
    # Creating a dataset like yours with 15 variables and 20 observations
    df <- data.frame(
        replicate(
            15,
            runif(20)
        )
    )
    
    df %>% 
        pivot_longer(
            cols = everything()
        ) %>% 
        group_by(name) %>% 
        summarize(
            'mean' = mean(value),
            'sd' = sd(value),
            'lower_ci' = t.test(value, mu = mean(value))$conf.int[1],
            'upper_ci' = t.test(value, mu = mean(value))$conf.int[2]
        )
#> # A tibble: 15 x 5
#>    name   mean    sd lower_ci upper_ci
#>  * <chr> <dbl> <dbl>    <dbl>    <dbl>
#>  1 X1    0.559 0.360    0.390    0.727
#>  2 X10   0.464 0.336    0.307    0.622
#>  3 X11   0.492 0.283    0.360    0.625
#>  4 X12   0.370 0.275    0.242    0.499
#>  5 X13   0.485 0.289    0.349    0.620
#>  6 X14   0.482 0.285    0.349    0.616
#>  7 X15   0.571 0.250    0.454    0.688
#>  8 X2    0.564 0.270    0.437    0.690
#>  9 X3    0.525 0.285    0.392    0.658
#> 10 X4    0.488 0.292    0.351    0.625
#> 11 X5    0.515 0.278    0.385    0.645
#> 12 X6    0.476 0.293    0.339    0.614
#> 13 X7    0.514 0.341    0.354    0.674
#> 14 X8    0.509 0.286    0.375    0.643
#> 15 X9    0.598 0.303    0.457    0.740
    
    # And now confirm that the variables pivoted into the 'name' column have only the values calculated according to their group
    
    nms <- paste0('X', 1:15)
    
    lapply(
        nms,
        function(x) {
            tmp <- df[[x]]
            list(
                'mean' = mean(tmp),
                'sd' = sd(tmp),
                'lower_ci' = t.test(tmp, mu = mean(tmp))$conf.int[1],
                'upper_ci' = t.test(tmp, mu = mean(tmp))$conf.int[2]
            )
        }
    )
#> [[1]]
#> [[1]]$mean
#> [1] 0.5586078
#> 
#> [[1]]$sd
#> [1] 0.3600182
#> 
#> [[1]]$lower_ci
#> [1] 0.3901142
#> 
#> [[1]]$upper_ci
#> [1] 0.7271015
#> 
#> 
#> [[2]]
#> [[2]]$mean
#> [1] 0.5635041
#> 
#> [[2]]$sd
#> [1] 0.2698614
#> 
#> [[2]]$lower_ci
#> [1] 0.437205
#> 
#> [[2]]$upper_ci
#> [1] 0.6898031
#> 
#> 
#> [[3]]
#> [[3]]$mean
#> [1] 0.5252184
#> 
#> [[3]]$sd
#> [1] 0.2845411
#> 
#> [[3]]$lower_ci
#> [1] 0.392049
#> 
#> [[3]]$upper_ci
#> [1] 0.6583877
#> 
#> 
#> [[4]]
#> [[4]]$mean
#> [1] 0.4880604
#> 
#> [[4]]$sd
#> [1] 0.2922919
#> 
#> [[4]]$lower_ci
#> [1] 0.3512636
#> 
#> [[4]]$upper_ci
#> [1] 0.6248572
#> 
#> 
#> [[5]]
#> [[5]]$mean
#> [1] 0.5150481
#> 
#> [[5]]$sd
#> [1] 0.2784126
#> 
#> [[5]]$lower_ci
#> [1] 0.384747
#> 
#> [[5]]$upper_ci
#> [1] 0.6453492
#> 
#> 
#> [[6]]
#> [[6]]$mean
#> [1] 0.4764728
#> 
#> [[6]]$sd
#> [1] 0.2928978
#> 
#> [[6]]$lower_ci
#> [1] 0.3393924
#> 
#> [[6]]$upper_ci
#> [1] 0.6135532
#> 
#> 
#> [[7]]
#> [[7]]$mean
#> [1] 0.5142307
#> 
#> [[7]]$sd
#> [1] 0.3413002
#> 
#> [[7]]$lower_ci
#> [1] 0.3544972
#> 
#> [[7]]$upper_ci
#> [1] 0.6739641
#> 
#> 
#> [[8]]
#> [[8]]$mean
#> [1] 0.5089112
#> 
#> [[8]]$sd
#> [1] 0.2859729
#> 
#> [[8]]$lower_ci
#> [1] 0.3750718
#> 
#> [[8]]$upper_ci
#> [1] 0.6427507
#> 
#> 
#> [[9]]
#> [[9]]$mean
#> [1] 0.59822
#> 
#> [[9]]$sd
#> [1] 0.3025874
#> 
#> [[9]]$lower_ci
#> [1] 0.4566047
#> 
#> [[9]]$upper_ci
#> [1] 0.7398352
#> 
#> 
#> [[10]]
#> [[10]]$mean
#> [1] 0.4644698
#> 
#> [[10]]$sd
#> [1] 0.3363694
#> 
#> [[10]]$lower_ci
#> [1] 0.307044
#> 
#> [[10]]$upper_ci
#> [1] 0.6218955
#> 
#> 
#> [[11]]
#> [[11]]$mean
#> [1] 0.4924063
#> 
#> [[11]]$sd
#> [1] 0.2828683
#> 
#> [[11]]$lower_ci
#> [1] 0.3600199
#> 
#> [[11]]$upper_ci
#> [1] 0.6247928
#> 
#> 
#> [[12]]
#> [[12]]$mean
#> [1] 0.370358
#> 
#> [[12]]$sd
#> [1] 0.2747296
#> 
#> [[12]]$lower_ci
#> [1] 0.2417806
#> 
#> [[12]]$upper_ci
#> [1] 0.4989354
#> 
#> 
#> [[13]]
#> [[13]]$mean
#> [1] 0.4845037
#> 
#> [[13]]$sd
#> [1] 0.2890672
#> 
#> [[13]]$lower_ci
#> [1] 0.3492161
#> 
#> [[13]]$upper_ci
#> [1] 0.6197913
#> 
#> 
#> [[14]]
#> [[14]]$mean
#> [1] 0.4822205
#> 
#> [[14]]$sd
#> [1] 0.2852961
#> 
#> [[14]]$lower_ci
#> [1] 0.3486978
#> 
#> [[14]]$upper_ci
#> [1] 0.6157431
#> 
#> 
#> [[15]]
#> [[15]]$mean
#> [1] 0.570778
#> 
#> [[15]]$sd
#> [1] 0.2499468
#> 
#> [[15]]$lower_ci
#> [1] 0.4537992
#> 
#> [[15]]$upper_ci
#> [1] 0.6877567

^{Created on 2022-03-01 by the reprex package (v1.0.0)}

system · March 22, 2022, 5:50pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.