I have a data set with 15 numerical variabel (SP_1, SP_2, SP_3, SP_4.....SP_15)
Each variable has 20 observations each.
I would like to create a table with; variable, mean, SD and confidence interval.
I have a data set with 15 numerical variabel (SP_1, SP_2, SP_3, SP_4.....SP_15)
Each variable has 20 observations each.
I would like to create a table with; variable, mean, SD and confidence interval.
I think you want dplyr::group_by
and dplyr::summarize
, along with tidyr::pivot_longer
. If you need to adjust the confidence level, you can supply the conf.level
argument to t.test
(by default it is set to 95%).
library(dplyr)
library(tidyr)
# Creating a dataset like yours with 15 variables and 20 observations
df <- data.frame(
replicate(
15,
runif(20)
)
)
df %>%
pivot_longer(
cols = everything()
) %>%
group_by(name) %>%
summarize(
'mean' = mean(value),
'sd' = sd(value),
'lower_ci' = t.test(value, mu = mean(value))$conf.int[1],
'upper_ci' = t.test(value, mu = mean(value))$conf.int[2]
)
I get the error below when I run this
Error in summarize(., mean = mean(value), sd = sd(value), lower_ci = t.test(value, :
argument "by" is missing, with no default
The posted code works for me. Make sure that you have a new R session running--googling that error seems to imply that the error is because you have other packages loaded that also have a function called summarise.
The gtsummary package might also serve your needs (Tutorial: tbl_summary • gtsummary).
The code returns values for everything while I would want to get mean, sd, se and CI or each variable separately.
e.g
mean SD se CI
Var1 x
Var 2 x
Var 3 x
Not sure what you mean by:
each variable separately
You could then filter the name
column to get at the variable you want. But the above code does in fact calculate statistics only with respect to the individual variables, not on the whole dataset. You can confirm it by comparing the data.frame
and list
results below:
library(dplyr)
library(tidyr)
# Creating a dataset like yours with 15 variables and 20 observations
df <- data.frame(
replicate(
15,
runif(20)
)
)
df %>%
pivot_longer(
cols = everything()
) %>%
group_by(name) %>%
summarize(
'mean' = mean(value),
'sd' = sd(value),
'lower_ci' = t.test(value, mu = mean(value))$conf.int[1],
'upper_ci' = t.test(value, mu = mean(value))$conf.int[2]
)
#> # A tibble: 15 x 5
#> name mean sd lower_ci upper_ci
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 X1 0.559 0.360 0.390 0.727
#> 2 X10 0.464 0.336 0.307 0.622
#> 3 X11 0.492 0.283 0.360 0.625
#> 4 X12 0.370 0.275 0.242 0.499
#> 5 X13 0.485 0.289 0.349 0.620
#> 6 X14 0.482 0.285 0.349 0.616
#> 7 X15 0.571 0.250 0.454 0.688
#> 8 X2 0.564 0.270 0.437 0.690
#> 9 X3 0.525 0.285 0.392 0.658
#> 10 X4 0.488 0.292 0.351 0.625
#> 11 X5 0.515 0.278 0.385 0.645
#> 12 X6 0.476 0.293 0.339 0.614
#> 13 X7 0.514 0.341 0.354 0.674
#> 14 X8 0.509 0.286 0.375 0.643
#> 15 X9 0.598 0.303 0.457 0.740
# And now confirm that the variables pivoted into the 'name' column have only the values calculated according to their group
nms <- paste0('X', 1:15)
lapply(
nms,
function(x) {
tmp <- df[[x]]
list(
'mean' = mean(tmp),
'sd' = sd(tmp),
'lower_ci' = t.test(tmp, mu = mean(tmp))$conf.int[1],
'upper_ci' = t.test(tmp, mu = mean(tmp))$conf.int[2]
)
}
)
#> [[1]]
#> [[1]]$mean
#> [1] 0.5586078
#>
#> [[1]]$sd
#> [1] 0.3600182
#>
#> [[1]]$lower_ci
#> [1] 0.3901142
#>
#> [[1]]$upper_ci
#> [1] 0.7271015
#>
#>
#> [[2]]
#> [[2]]$mean
#> [1] 0.5635041
#>
#> [[2]]$sd
#> [1] 0.2698614
#>
#> [[2]]$lower_ci
#> [1] 0.437205
#>
#> [[2]]$upper_ci
#> [1] 0.6898031
#>
#>
#> [[3]]
#> [[3]]$mean
#> [1] 0.5252184
#>
#> [[3]]$sd
#> [1] 0.2845411
#>
#> [[3]]$lower_ci
#> [1] 0.392049
#>
#> [[3]]$upper_ci
#> [1] 0.6583877
#>
#>
#> [[4]]
#> [[4]]$mean
#> [1] 0.4880604
#>
#> [[4]]$sd
#> [1] 0.2922919
#>
#> [[4]]$lower_ci
#> [1] 0.3512636
#>
#> [[4]]$upper_ci
#> [1] 0.6248572
#>
#>
#> [[5]]
#> [[5]]$mean
#> [1] 0.5150481
#>
#> [[5]]$sd
#> [1] 0.2784126
#>
#> [[5]]$lower_ci
#> [1] 0.384747
#>
#> [[5]]$upper_ci
#> [1] 0.6453492
#>
#>
#> [[6]]
#> [[6]]$mean
#> [1] 0.4764728
#>
#> [[6]]$sd
#> [1] 0.2928978
#>
#> [[6]]$lower_ci
#> [1] 0.3393924
#>
#> [[6]]$upper_ci
#> [1] 0.6135532
#>
#>
#> [[7]]
#> [[7]]$mean
#> [1] 0.5142307
#>
#> [[7]]$sd
#> [1] 0.3413002
#>
#> [[7]]$lower_ci
#> [1] 0.3544972
#>
#> [[7]]$upper_ci
#> [1] 0.6739641
#>
#>
#> [[8]]
#> [[8]]$mean
#> [1] 0.5089112
#>
#> [[8]]$sd
#> [1] 0.2859729
#>
#> [[8]]$lower_ci
#> [1] 0.3750718
#>
#> [[8]]$upper_ci
#> [1] 0.6427507
#>
#>
#> [[9]]
#> [[9]]$mean
#> [1] 0.59822
#>
#> [[9]]$sd
#> [1] 0.3025874
#>
#> [[9]]$lower_ci
#> [1] 0.4566047
#>
#> [[9]]$upper_ci
#> [1] 0.7398352
#>
#>
#> [[10]]
#> [[10]]$mean
#> [1] 0.4644698
#>
#> [[10]]$sd
#> [1] 0.3363694
#>
#> [[10]]$lower_ci
#> [1] 0.307044
#>
#> [[10]]$upper_ci
#> [1] 0.6218955
#>
#>
#> [[11]]
#> [[11]]$mean
#> [1] 0.4924063
#>
#> [[11]]$sd
#> [1] 0.2828683
#>
#> [[11]]$lower_ci
#> [1] 0.3600199
#>
#> [[11]]$upper_ci
#> [1] 0.6247928
#>
#>
#> [[12]]
#> [[12]]$mean
#> [1] 0.370358
#>
#> [[12]]$sd
#> [1] 0.2747296
#>
#> [[12]]$lower_ci
#> [1] 0.2417806
#>
#> [[12]]$upper_ci
#> [1] 0.4989354
#>
#>
#> [[13]]
#> [[13]]$mean
#> [1] 0.4845037
#>
#> [[13]]$sd
#> [1] 0.2890672
#>
#> [[13]]$lower_ci
#> [1] 0.3492161
#>
#> [[13]]$upper_ci
#> [1] 0.6197913
#>
#>
#> [[14]]
#> [[14]]$mean
#> [1] 0.4822205
#>
#> [[14]]$sd
#> [1] 0.2852961
#>
#> [[14]]$lower_ci
#> [1] 0.3486978
#>
#> [[14]]$upper_ci
#> [1] 0.6157431
#>
#>
#> [[15]]
#> [[15]]$mean
#> [1] 0.570778
#>
#> [[15]]$sd
#> [1] 0.2499468
#>
#> [[15]]$lower_ci
#> [1] 0.4537992
#>
#> [[15]]$upper_ci
#> [1] 0.6877567
Created on 2022-03-01 by the reprex package (v1.0.0)
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.