Hi tidyverse community, I am wondering if there is a recommended tidyverse workflow when you want to summarise multiple columns in a tibble using multiple arbitrary summary functions. In addition, the results should be contained in a 'tidy' tibble.
For example, I can summarise one column multiple ways (e.g. using min()
and anyNA()
):
library(tidyverse)
iris %>% summarise_at("Petal.Width", funs(min, anyNA))
#> min anyNA
#> 1 0.1 FALSE
I can then extend the previous example to summarise multiple columns:
iris %>% summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA))
#> Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min
#> 1 4.3 2 1 0.1
#> Sepal.Length_anyNA Sepal.Width_anyNA Petal.Length_anyNA
#> 1 FALSE FALSE FALSE
#> Petal.Width_anyNA
#> 1 FALSE
Unfortunately, the above result isn't tidy. Perhaps I should use gather and spread to get the desired output:
iris %>%
summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA)) %>%
gather(key = "key", value = "value") %>%
separate(key, c("variable", "stat"), sep = "_") %>%
spread(stat, value)
#> variable anyNA min
#> 1 Petal.Length 0 1.0
#> 2 Petal.Width 0 0.1
#> 3 Sepal.Length 0 4.3
#> 4 Sepal.Width 0 2.0
This is where I wonder if I'm heading in the wrong direction. Is there another tidyverse way I should do this? One downfall of this approach is the logical result for anyNA()
is now coerced to numeric.
My current workaround is to ditch summarise_at()
completely and define a function which returns a one row tibble. The first column returned is the original tibble column name. Each remaining column relates to an arbitrary summary function.
report <- function(x, name) {
tibble(
name = name,
min = min(x),
anyNA = anyNA(x)
)}
Then I use purrr::imap_dfr()
to get the result:
iris %>% select(Sepal.Length:Petal.Width) %>% imap_dfr(report)
#> # A tibble: 4 x 3
#> name min anyNA
#> <chr> <dbl> <lgl>
#> 1 Sepal.Length 4.3 FALSE
#> 2 Sepal.Width 2 FALSE
#> 3 Petal.Length 1 FALSE
#> 4 Petal.Width 0.1 FALSE
This seems OK to me and was the approach I suggested as an answer to a recent question. But I'm not sure if the workaround is necessary and I've missed an easy step somewhere.
Is the workaround a good way to go or am I in danger of getting into some bad habits?
Are there other tidyverse approaches recommended in this situation?