This works, but it seems a bit cumbersome having to create a temporary tibble with the number of observations and then join that with the test estimates. I wonder, is there a better way?
Given the data d
:
library('tidyverse')
library('broom')
set.seed(354654)
d = tibble(value = rnorm(100),
category = sample(1:5, replace = TRUE, 100),
group = sample(c('A', 'B'), replace = TRUE, 100)) %>%
arrange(category)
I.e.
> d
# A tibble: 100 x 3
value category group
<dbl> <int> <chr>
1 0.596 1 B
2 0.0992 1 B
3 -1.17 1 B
4 -0.341 1 B
5 0.222 1 A
6 0.479 1 B
7 -0.155 1 A
8 0.921 1 B
9 0.795 1 B
10 0.215 1 B
# … with 90 more rows
I want to perform 5 t.test
calls, one for each category
, comparing group
and get the number of observations in each group, this I can do like so:
est = d %>% group_by(category) %>% do(tidy(t.test(value ~ group, data = .)))
ns = d %>% count(category, group) %>% spread(group, n)
est %>% full_join(ns, by = 'category')
# A tibble: 5 x 13
# Groups: category [?]
category estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative A B
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <int> <int>
1 1 0.296 0.290 -0.00634 0.889 0.385 18.8 -0.402 0.994 Welch Two Sample t-test two.sided 9 13
2 2 -0.698 -0.668 0.0299 -1.18 0.298 4.23 -2.30 0.903 Welch Two Sample t-test two.sided 5 7
3 3 0.359 0.388 0.0292 0.801 0.435 15.6 -0.592 1.31 Welch Two Sample t-test two.sided 14 10
4 4 0.387 0.0910 -0.296 0.791 0.442 13.9 -0.664 1.44 Welch Two Sample t-test two.sided 8 13
5 5 0.271 0.232 -0.0388 0.713 0.485 18.5 -0.526 1.07 Welch Two Sample t-test two.sided 7 14
But I'd prefer not to have to create a temporary tibble, which I then join?