Please share your recommendation on using the pmap
inside mutate
? I am looking for legibility and effective code.
In my real application, I have more complex data, nested dataframes
and new columns will be added using a series of piped mutate
statement where pmap
invoking some custom functions.
The pmap
can be used easily with dataframes
, because the names are preserved (not the parameter order is important).
Here am reusing an example from Advanced R.
library(tidyverse)
params <- tibble::tribble(
~ n, ~ min, ~ max,
1L, 2, 10,
2L, 4, 100,
3L, 8, 1000
)
params |> pmap(runif)
#> [[1]]
#> [1] 4.022187
#>
#> [[2]]
#> [1] 8.699843 89.863384
#>
#> [[3]]
#> [1] 278.3641 781.0018 194.5025
params |> select(max, min, n) |> pmap(runif)
#> [[1]]
#> [1] 6.974362
#>
#> [[2]]
#> [1] 65.57562 90.35880
#>
#> [[3]]
#> [1] 143.0269 628.9549 297.8434
params |> select(min, n, max) |> pmap(runif)
#> [[1]]
#> [1] 7.684047
#>
#> [[2]]
#> [1] 33.58746 24.00720
#>
#> [[3]]
#> [1] 592.7216 171.2659 581.2717
A new column can be added in following way, but this solution cannot be replicated easily in a series of piped mutate
statements, because the initial dataframe
(params) appears both on the left and right side of the pipe
operator.
params |> mutate(result = pmap(params , runif))
#> # A tibble: 3 x 4
#> n min max result
#> <int> <dbl> <dbl> <list>
#> 1 1 2 10 <dbl [1]>
#> 2 2 4 100 <dbl [2]>
#> 3 3 8 1000 <dbl [3]>
Something similar would be ideal; the .data
pronoun is not working here (it's for a different use).
params |> mutate(result = pmap(.data , runif))
#> Error in `mutate()`:
#> ! Problem while computing `result = pmap(.data, runif)`.
#> Caused by error in `stop_bad_type()`:
#> ! Element 1 of `.l` must be a vector, not an environment
My alternative solution so far is the following; but the main drawback is that the names are not preserved, the order of the arguments is considered instead. On the other hand, I am not sure if this is an effective code or not – considering additional copying.
params |> mutate(result = pmap(list(n, min, max) , runif))
#> # A tibble: 3 x 4
#> n min max result
#> <int> <dbl> <dbl> <list>
#> 1 1 2 10 <dbl [1]>
#> 2 2 4 100 <dbl [2]>
#> 3 3 8 1000 <dbl [3]>
params |> mutate(result = pmap(list(min, n, max) , runif))
#> # A tibble: 3 x 4
#> n min max result
#> <int> <dbl> <dbl> <list>
#> 1 1 2 10 <dbl [2]>
#> 2 2 4 100 <dbl [4]>
#> 3 3 8 1000 <dbl [8]>
Can you suggest any better solution? Thank you.