map function: question on the relation between input .x and input into .f (function arguments)

zoowalk · March 16, 2023, 5:10pm

Hello,

I love purrr and use it more or less daily. But there's one thing which keeps bugging me, and I was wondering whether someone can help clarify the following for me.

My question pertains to how to address (?) function arguments in .f

The documentation states that .f is

A function, specified in one of the following ways:

A named function, e.g. mean.
An anonymous function, e.g. \(x) x + 1 or function(x) x + 1.
A formula, e.g. ~ .x + 1. You must use .x to refer to the first argument. Only recommended if you require backward compatibility with older versions of R.

Below my examples

library(tidyverse)
my_df <- tibble(col_num=1:3)

fn_square <- function(z) {z^2}

#NAMED FUNCTION
my_df %>% 
  mutate(num_2=map_dbl(.x=col_num, .f=fn_square)) #named function
#> # A tibble: 3 × 2
#>   col_num num_2
#>     <int> <dbl>
#> 1       1     1
#> 2       2     4
#> 3       3     9

 #ANONYMOUS FUNCTION
my_df %>% 
  mutate(num_2=map_dbl(.x=col_num, .f=\(x) x^2))
#> # A tibble: 3 × 2
#>   col_num num_2
#>     <int> <dbl>
#> 1       1     1
#> 2       2     4
#> 3       3     9

#FORMULA
my_df %>% 
  mutate(num_2=map_dbl(.x=col_num, .f=~fn_square(z=.x))) 
#> # A tibble: 3 × 2
#>   col_num num_2
#>     <int> <dbl>
#> 1       1     1
#> 2       2     4
#> 3       3     9

What I love about the formula approach is that it is very clear regrading the relation of the input and where it's fed into to the function. .X goes into the function argument z. The other two variants don't have this - there is no .x mentioned as .f input which I always find somewhat obscure. This issue is still somewhat negligible when using a simple map function, but becomes more pertinent when using map2 or pmap functions.

Now my question/issue:
At least with the latest purrr release (maybe I didn't notice it earlier), it is now stated that formula (my preferred approach) is "Only recommended if you require backward compatibility with older versions of R." Why is this? Or to put it differently, is there any recommended approach which let's me clearly state which input goes into which function attribute?

I would have hoped that the approach below works, but unfortunately - and surprisingly - no.

my_df %>% 
  mutate(num_2=map_dbl(.x=col_num, .f=fn_square(z=.x))) 

#> Error in `mutate()`:
#> ℹ In argument: `num_2 = map_dbl(.x = col_num, .f = fn_square(z = .x))`.
#> Caused by error in `fn_square()`:
#> ! object '.x' not found

#> Backtrace:
#>      ▆
#>   1. ├─my_df %>% ...
#>   2. ├─dplyr::mutate(., num_2 = map_dbl(.x = col_num, .f = fn_square(z = .x)))
#>   3. ├─dplyr:::mutate.data.frame(., num_2 = map_dbl(.x = col_num, .f = fn_square(z = .x)))
#>   4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   5. │   ├─base::withCallingHandlers(...)
#>   6. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   7. │     └─mask$eval_all_mutate(quo)
#>   8. │       └─dplyr (local) eval()
#>   9. ├─purrr::map_dbl(.x = col_num, .f = fn_square(z = .x))
#>  10. │ └─purrr:::map_("double", .x, .f, ..., .progress = .progress)
#>  11. │   └─purrr::as_mapper(.f, ...)
#>  12. ├─global fn_square(z = .x)
#>  13. └─base::.handleSimpleError(`<fn>`, "object '.x' not found", base::quote(fn_square(z = .x)))
#>  14.   └─dplyr (local) h(simpleError(msg, call))
#>  15.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

I am sure the tidyverse team has good reason for this behavior. I personally would have found it the most accessible formulation.

Grateful I anyone has an idea about this.

Many thanks

nirgrahamuk · March 16, 2023, 6:03pm

directly translate the formula 'look' with anonymous function like so

my_df |>
  mutate(num_2=map_dbl(.x=col_num, .f=\(x_)fn_square(z=x_)))

i.e. \(x_) replaces ~
and because x_ was chosen this replaces .x

zoowalk · March 17, 2023, 7:52am

Many thanks for your reply! I get your point, and indeed, the "look" is more straightforward.
But it's unfortunately only the "look" and it can be - at least for me - somewhat deceptive.

E.g. the example below:

I have a tibble with two columns, and a function which combines them.

library(tidyverse)

my_df <- tibble(col_num=1:3, col_a=rep("a", 3))

fn_comb <- function(h, i) {
  paste(h, i, sep="-")
}

Let's take the approach suggested, and include an anonymous function which takes the arguments .x. and .y to mirror the name of the inputs .x. and .y in map2. The approach works nicely, we get result intended.

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(.x, .y) fn_comb(h=.x, i=.y)))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

However, if I swap the arguments, I still get the same result. The reason is that the arguments' names in the anonymous function actually do not have any relation with .x. and .y input. .x is always fed into the first argument of the anonymous functions; and .y is always fed into the second argument of the anonymous function. In fact, the names of the arguments in the anonymous function could be anything. Hence, there is no 'substantive link' between the names of arguments in the anonymous function and the inputs .x and .y (in map2, but the same applies to map and pmap). In other words, the anonymous function is pure cosmetic.

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(.y, .x) fn_comb(h=.y, i=.x)))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(this, that) fn_comb(h=this, i=that)))
#> # A tibble: 3 × 3
#>   col_num col_a comb 
#>     <int> <chr> <chr>
#> 1       1 a     1-a  
#> 2       2 a     2-a  
#> 3       3 a     3-a

My preferred version would have been the one states below. But it doesn't work.

I know it's really a small detail, and as mentioned, I love purrr. But this details is a constant itch to me
I was wondering whether others see it similarly. Anyway, thanks again for your reply.

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=fn_comb(h=.x, i=.y)))
#> Error in `mutate()`:
#> ℹ In argument: `comb = map2_chr(.x = col_num, .y = col_a, .f = fn_comb(h
#>   = .x, i = .y))`.
#> Caused by error in `paste()`:
#> ! object '.x' not found

#> Backtrace:
#>      ▆
#>   1. ├─my_df %>% ...
#>   2. ├─dplyr::mutate(...)
#>   3. ├─dplyr:::mutate.data.frame(...)
#>   4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   5. │   ├─base::withCallingHandlers(...)
#>   6. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   7. │     └─mask$eval_all_mutate(quo)
#>   8. │       └─dplyr (local) eval()
#>   9. ├─purrr::map2_chr(...)
#>  10. │ └─purrr:::map2_("character", .x, .y, .f, ..., .progress = .progress)
#>  11. │   └─purrr::as_mapper(.f, ...)
#>  12. ├─global fn_comb(h = .x, i = .y)
#>  13. │ └─base::paste(h, i, sep = "-")
#>  14. └─base::.handleSimpleError(...)
#>  15.   └─dplyr (local) h(simpleError(msg, call))
#>  16.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

^{Created on 2023-03-17 with reprex v2.0.2}

nirgrahamuk · March 17, 2023, 9:22am

Hi, I took a further look at your issue; I believe its a complication that might be caused by your choice of .x, .y as the names for parameterising the anonymous function in map (likely because theses are in some way 'special' to purrr). I think you can cleanly sidestep the problem by using any other form of parameter namings.

i.e. plain x,y

library(tidyverse)

my_df <- tibble(col_num=1:3, col_a=rep("a", 3))

fn_comb <- function(h, i) {
  paste(h, i, sep="-")
}

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(x, y) fn_comb(h=y, i=x)))

my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(y, x) fn_comb(h=y, i=x)))

I think you were close to that yourself, but you only looked at that/this in the conventionally aligned way ?


my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(this, that) fn_comb(h=this, i=that)))


my_df %>% 
  mutate(comb=map2_chr(.x=col_num, .y=col_a, .f=\(that, this) fn_comb(h=this, i=that)))

my personal preference in parameter naming has gone from using ~.x for purrr to (x_)

system · April 7, 2023, 9:23am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.