Hi, all, long time no see!
A quick question: when you analyze a new data frame, and you want to summarise
it including t-statistics confidence intervals for numeric variables, what functions/packages do you use? My old workflow, largely copied from
is:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
test <- data.frame(
n = c(298, 298, 298, 298, 298, 298, 298, 298, 298, 298, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
run = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
mAP = c(0.8112, 0.8006, 0.8076, 0.7999, 0.8067, 0.8046, 0.8004, 0.799,
0.8052, 0.8002, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.8333, 0.8333,
0.8333, 1, 0.8333, 1, 1, 0.8333, 1, 1)
)
lower_ci <- function(mean, se, n, conf_level = 0.95){
lower_ci <- mean - qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
upper_ci <- function(mean, se, n, conf_level = 0.95){
upper_ci <- mean + qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
foobar <- test %>%
group_by(n) %>%
summarise(smean = mean(mAP, na.rm = TRUE),
ssd = sd(mAP, na.rm = TRUE)) %>%
mutate(se = ssd / sqrt(n),
lower_ci = lower_ci(smean, se, n),
upper_ci = upper_ci(smean, se, n))
#> Warning in qt(1 - ((1 - conf_level)/2), n - 1): Si è prodotto un NaN
#> Warning in qt(1 - ((1 - conf_level)/2), n - 1): Si è prodotto un NaN
Created on 2019-05-28 by the reprex package (v0.3.0)
but I've been away from R for a while, and I may have missed/forgotten better ways to do it. What do you think? Would you modify/improve something? Would you use a completely different approach altogether? Let me know!