Iterating over multiple fields/vectors

tbradley · October 19, 2018, 8:36pm

Here is a way to do this using dplyr and purrr that utilizes group_by, summarize_at, and purrr::partial.

Putting aside the purrr::partial portion for now, I had to make changes to your my_means function to work with the group_by/summarize workflow instead of split/map. To see my thoughts on the differences you can see this thread. The function now takes a vector rather than a dataframe and returns only the mean (which meets the requirements of a function passed to summarize).

So, now the fun part. purrr::partial allows you to pass a function to it while setting different variables to change with each iteration of the function. If you call partial inside of a map call then these preset functions are conveniently saved to a list. Now the tricky part.. How do we run a list of functions on a specific subset of columns of our dataframe. Luckily, with rlang (here using functions reexported with dplyr) we can call our function list inside of the funs argument/function in summarize_at with !!!. This will output the results for each of the functions in the list as its own column and each row will contain a different group.

One other important thing to note is that if you want to call the list of functions from summarize_at as shown, the list has to be named. Hence, the reason for creating a dynamic list of names and using purrr::set_names to apply them.

Here is the reprex:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)


my_mean <- function(x, less_than) {
  mean <- mean(x[x < less_than])
  mean
}

my_cutoffs <- c(25, 30, 35)

my_means_names <- purrr::map(my_cutoffs, ~paste0("mean_lt_", .x))

my_partial_mean <- purrr::map(my_cutoffs, ~purrr::partial(my_mean, less_than = .x)) %>% 
  purrr::set_names(nm = my_means_names)


mtcars %>% 
  group_by(cyl) %>% 
  summarize_at(vars(mpg), funs(!!!my_partial_mean))
#> # A tibble: 3 x 4
#>     cyl mean_lt_25 mean_lt_30 mean_lt_35
#>   <dbl>      <dbl>      <dbl>      <dbl>
#> 1     4       22.6       23.7       26.7
#> 2     6       19.7       19.7       19.7
#> 3     8       15.1       15.1       15.1

Created on 2018-10-19 by the reprex package (v0.2.0).

I recently wrote a blog post using this exact same workflow to calculate multiple quantiles for different groups with dplyr