My question is whether it is possible to do the above for a set of levels of less_than. I could, of course, just use for over the levels and merge the resulting data frames on cyl, but where is the fun in that?! An example of what I am looking for using just two levels is at the end of the following:
Finally, I should also mention that the calculations eventually will make their way into a bootstrapping set-up per group and level, but I am trying to do one step at the time.
That meets your requirements, and I don't know if you need exactly this shape, but I am worried you'll be bitten by an "untidy" bug. Embedding information in column names is a bad practice and can make the data a pain to use. It might be better to store those limits in an extra column.
Here is a way to do this using dplyr and purrr that utilizes group_by, summarize_at, and purrr::partial.
Putting aside the purrr::partial portion for now, I had to make changes to your my_means function to work with the group_by/summarize workflow instead of split/map. To see my thoughts on the differences you can see this thread. The function now takes a vector rather than a dataframe and returns only the mean (which meets the requirements of a function passed to summarize).
So, now the fun part. purrr::partial allows you to pass a function to it while setting different variables to change with each iteration of the function. If you call partial inside of a map call then these preset functions are conveniently saved to a list. Now the tricky part.. How do we run a list of functions on a specific subset of columns of our dataframe. Luckily, with rlang (here using functions reexported with dplyr) we can call our function list inside of the funs argument/function in summarize_at with !!!. This will output the results for each of the functions in the list as its own column and each row will contain a different group.
One other important thing to note is that if you want to call the list of functions from summarize_at as shown, the list has to be named. Hence, the reason for creating a dynamic list of names and using purrr::set_names to apply them.
Here is the reprex:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
my_mean <- function(x, less_than) {
mean <- mean(x[x < less_than])
mean
}
my_cutoffs <- c(25, 30, 35)
my_means_names <- purrr::map(my_cutoffs, ~paste0("mean_lt_", .x))
my_partial_mean <- purrr::map(my_cutoffs, ~purrr::partial(my_mean, less_than = .x)) %>%
purrr::set_names(nm = my_means_names)
mtcars %>%
group_by(cyl) %>%
summarize_at(vars(mpg), funs(!!!my_partial_mean))
#> # A tibble: 3 x 4
#> cyl mean_lt_25 mean_lt_30 mean_lt_35
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 22.6 23.7 26.7
#> 2 6 19.7 19.7 19.7
#> 3 8 15.1 15.1 15.1
Thank you so much @nwerth and @tbradley for the suggestions. Both of those would do the trick, but it looks like I can only pick one as the solution and I am slightly partial to the last one (no pun intended!) because it is easier to read. I need to get better at using purrr.
I have not thought about the tidy aspects of the data. That is a really good point, especially since they will be used later. I sometimes create tables like that, but I tend to only use them as the last stop before I have R output them to LaTeX.