Dear all,
I am trying to run a t-tests over several measurements as a way to explore the power of tibbles containing lists in columns, which I find elegant.
In base R (not using a complex tibble or alike), I would do something like that:
library(broom)
traits <- colnames(iris)[colnames(iris) != "Species"]
list_ttests <- lapply(traits, function(trait) tidy(t.test(iris[iris$Species == "setosa", trait],
iris[iris$Species == "versicolor", trait])))
res1 <- cbind(Traits = traits, do.call("rbind", list_ttests))
colnames(res1)[colnames(res1) %in% c("estimate1", "estimate2")] <- c("mean_setosa", "mean_versicolor")
View(res1)
Trying to embrace tidyverse, I could only come up with the following:
library(tidyr) ## devel version tidyr_0.8.3.9000!
library(dplyr)
library(purrr)
iris %>%
filter(Species %in% c("setosa", "versicolor")) %>%
pivot_longer(cols = -Species, names_to = "variable", values_to = "value") %>%
group_by(Species, variable) %>%
summarise(values = list(value)) %>%
pivot_wider(names_from = Species, values_from = values) %>% ##tibble of lists ready
bind_cols(pmap_df(., ~ tidy(t.test(unlist(..2), unlist(..3))))) %>%
rename(mean_setosa = estimate1, mean_versicolor = estimate2) -> res2
View(res2)
It is not too horrendous, but it is still complicated for beginners.
I wonder if something more elegant could be done to arrive faster to the tibble of lists (i.e. not pivoting twice), and if in the call to pmap there would be way to use column names (setosa and versicolor) instead of the ugly ..1
, ..2
.
Any tips welcome! @mishabalyasin?
Thanks