mutate using fn that returns a df or list

cawthm · November 22, 2018, 7:01pm

Suppose we have a function that returns a named df/list with more than one variable output.

Question: How can run that function against a df and create more than one new variable at a time, and what is the most natural way with dplyr/ purrr?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

toy_fun <- function(chr) {
  data.frame(let_rank = which(letters == chr),
             rando = rnorm(n = length(chr)))
}

# for example, given

df <- data.frame(lets = sample(letters, 10))

# what is the correct dplyr/purrr way to map the function to 
# capture the output of the function?

# this of course fails

df %>% mutate(toy_fun(lets), .id = names(toy_fun(lets)))
#> Error in mutate_impl(.data, dots): Evaluation error: argument "chr" is missing, with no default.

# here's one way which works, but which seems unwieldy and ugly:

bind_cols(df, map_df(df$lets, toy_fun))
#>    lets let_rank       rando
#> 1     d        4 -0.30910301
#> 2     j       10  1.54324912
#> 3     i        9 -0.57664505
#> 4     u       21  1.15671969
#> 5     b        2 -0.03828406
#> 6     f        6 -0.64202232
#> 7     l       12  0.50793796
#> 8     y       25  0.98867755
#> 9     x       24  0.02617367
#> 10    h        8 -0.75882107

# this doesn't work like I thought it would:
map_dfc(df$lets, toy_fun)
#>   let_rank     rando let_rank1   rando1 let_rank2   rando2 let_rank3
#> 1        4 0.8284374        10 1.342487         9 2.982913        21
#>     rando3 let_rank4    rando4 let_rank5    rando5 let_rank6   rando6
#> 1 0.568951         2 0.7849361         6 -0.180958        12 2.808249
#>   let_rank7    rando7 let_rank8       rando8 let_rank9     rando9
#> 1        25 0.4196204        24 -0.008707815         8 -0.5353852

^{Created on 2018-11-22 by the reprex package (v0.2.1)}

mishabalyasin · November 22, 2018, 8:10pm

I would say, that common way to do it is map->unnest combo. It is how you would use it in, for example, broom.


suppressPackageStartupMessages(library(tidyverse))

toy_fun <- function(chr) {
  data.frame(let_rank = which(letters == chr),
             rando = rnorm(n = length(chr)))
}
df <- data.frame(lets = sample(letters, 10))
df %>% 
  mutate(res = purrr::map(lets, toy_fun)) %>% 
  tidyr::unnest(res)
#>    lets let_rank      rando
#> 1     l       12  0.5670311
#> 2     s       19  0.6504350
#> 3     k       11 -1.0761233
#> 4     n       14 -0.8507612
#> 5     x       24  0.2007877
#> 6     y       25 -0.3233344
#> 7     t       20  0.5016116
#> 8     h        8  0.8095949
#> 9     q       17 -1.4610645
#> 10    p       16  0.7157577

^{Created on 2018-11-22 by the reprex package (v0.2.1)}

jcblum · November 22, 2018, 9:31pm

These links from a couple of related discussions might be of interest (expand the box to see the links in the first one):

system · November 29, 2018, 9:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.