piping inside `map*()`: pipes stripped from input

dromano · March 4, 2020, 3:16pm

If I try to run something like table %>% map(. %>% pull(column), function), the inside pipe is apparently stripped out, so that map()'s first argument is the table itself, and an attempt is made to make pull(column) the second argument. Why is this?

Since it was hard to tell what was going on with map(), I used debug(map2) so I could inspect the .x and .y arguments passed to map2() in the following code:

library(tidyverse)
tibble(a = 1:3) %>%
  map2(. %>% pull(a), . %>% pull(a),  ~ 'yes')
#> Error: `.y` must be a vector, not a `fseq/function` object

^{Created on 2020-03-04 by the reprex package (v0.3.0)}

(It would have nice to be able to reprex walking through the code with debug(), but I couldn't figure that out either! A question for another post. )

nirgrahamuk · March 4, 2020, 3:42pm

Hi,
when you pass a tibble or df to map, it wants to iterate over that columnwise for you.
Below I switch from map to walk, just because its enough to see the output from cat() without storing results and returning them...
Consider this :

library(tidyverse)
input_df<- tibble(a = 1:3,
                  b = letters[1:3]) 
walk(input_df,~cat("x",.,"y\n"))
 #> walk(input_df,~cat("x",.,"y\n"))
x 1 2 3 y
x a b c y

you can certainly use pipes inside map and walk.
little demo:

library(tidyverse)

list_of_tb <- list(tibble(a = 1:3),
                   b = letters[1:3])

walk(list_of_tb, ~(.) %>% head(1) %>% print())
walk(list_of_tb, ~(.) %>% tail(1) %>% print())

Is there something particular you wish to do that I could help with?

dromano · March 4, 2020, 6:33pm

Thanks @nirgrahamuk, I've been wondering what walk() does . Still not exactly sure, but your example helps.

The question I'm trying to sort out comes from the difference in behavior below, which now looks like an interaction that occurs when pipes are used both outside and inside map():

library(tidyverse)
input_df<- tibble(a = 1:3,
                  b = letters[1:3]) 
walk(input_df,~cat("x",.,"y\n"))
#> x 1 2 3 y
#> x a b c y

walk(input_df %>% filter(a != 2),~cat("x",.,"y\n"))
#> x 1 3 y
#> x a c y

input_df %>% 
  walk(. %>% filter(a != 2), ~cat("x",.,"y\n"))
#> Error in .f(.x[[i]], ...): unused argument (~cat("x", ., "y\n"))

^{Created on 2020-03-04 by the reprex package (v0.3.0)}

It seems that because '.' is used internally to walk(), there's a clash with its use with previous pipe, and what I was hoping is to be able to pipe into map(), too.

The context in which this came up for me was in puzzle that arose in trying to explain to my students what the effect of weighting data is; specifically, if you have a table like this:

library(tidyverse)
input_df <- 
  tibble(a = letters[1:3],
         b = 3:5)
input_df
#> # A tibble: 3 x 2
#>   a         b
#>   <chr> <int>
#> 1 a         3
#> 2 b         4
#> 3 c         5

^{Created on 2020-03-04 by the reprex package (v0.3.0)}
how would you create a new table by repeating each row according to the value in column b? (So a table with three copies of row 1, four of row 2, etc.)

nirgrahamuk · March 4, 2020, 7:22pm

There is a more fundamental issue with this example, because it implies walk taking 3 arguments, whereas it can only take 2.

I think the proper analog to using pipe to pass the object on left as first argument to function on right (where we want walk to be such a function) is

> input_df %>% filter(a != 2) %>% walk(~cat("x",.,"y\n"))
x a b c y
x 3 4 5 y

The specific challenge you gave to your students has a tidy solution:

library(tidyverse)
input_df <- 
  tibble(a = letters[1:3],
         b = 3:5)

uncount(input_df,b,.remove = FALSE)
# A tibble: 12 x 2
   a         b
   <chr> <int>
 1 a         3
 2 a         3
 3 a         3
 4 b         4
 5 b         4
 6 b         4
 7 b         4
 8 c         5
 9 c         5
10 c         5
11 c         5
12 c         5

joels · March 5, 2020, 12:04am

In base R you could do:

d <- tibble(a = letters[1:3], b = 3:5)

d[rep(1:nrow(d), d$b), ]

I wasn't aware of uncount until I read Nir's answer. By analogy with the base R solution, slice can also be used to repeat rows:

d %>% slice(rep(1:nrow(.), b))

Regarding walk: walk iterates just like map, but it doesn't return the list. This can be useful when you want to perform some action, but don't need anything returned. For example, the code below writes a data frame to an Excel file and conditionally formats some of the columns.

library(openxlsx)

wb=createWorkbook()
sht=addWorksheet(wb, "Data")

writeData(wb, sht, mtcars)
map(c(1,3,7), 
    ~conditionalFormatting(wb, sht, cols=.x, rows=1:nrow(mtcars) + 1, 
                           rule=sprintf(">%s", median(mtcars[,.x])))
    )

saveWorkbook(wb, "myfile.xlsx")

But note that the map step returns an empty list:

[[1]]
[1] 0

[[2]]
[1] 0

[[3]]
[1] 0

If you use walk instead

walk(c(1,3,7), 
     ~conditionalFormatting(wb, sht, cols=.x, rows=1:nrow(mtcars) + 1, 
                            rule=sprintf(">%s", median(mtcars[,.x])))
    )

then the "side effect"--the conditional formatting of the Excel file--is still implemented, but without the list being returned.

I don't use walk very often and I don't know if this is a particularly good example, but it was on my mind, as I was doing some conditional formatting today.

dromano · March 5, 2020, 2:41am

I'm not sure I understand why this would imply there are three arguments, since in other contexts, . %>% filter(a != 2) is treated as a single object. Could you say more about this?

The uncount() function is very handy! It wasn't an exercise for my students; I was just trying to figure out how to modify the midwest dataset that Hadley Wickham uses in this example to illustrate weighting:

library(tidyverse)
# Unweighted
ggplot(midwest, aes(percwhite, percbelowpoverty)) + 
  geom_point() + 
  geom_smooth(method = lm, size = 1)


# Weighted by population
ggplot(midwest, aes(percwhite, percbelowpoverty)) + 
  geom_point(aes(size = poptotal / 1e6)) + 
  geom_smooth(aes(weight = poptotal), method = lm, size = 1) +
  scale_size_area(guide = "none")

^{Created on 2020-03-04 by the reprex package (v0.3.0)}
and I thought I'd modify the dataset -- by 'uncounting' poptotal, now that I know the term -- so that students could have a more concrete way of getting a handle on what weighting does:

library(tidyverse)
# Weighting by uncounting
midwest %>%
  mutate(pop_in_kilo = round(poptotal / 1000)) %>% 
  uncount(pop_in_kilo) %>% 
  ggplot(aes(percwhite, percbelowpoverty)) +
  geom_point() +
  geom_smooth(method = lm, size = 1)

^{Created on 2020-03-04 by the reprex package (v0.3.0)}

I'd still like to figure out how to use map* to mimic uncount(), though.

andresrcs · March 5, 2020, 3:04am

If we take the pipe out, it becomes evident that you are passing 3 arguments, remember that the pipe takes the object on the left and passes it as the first argument for the function on the right.

walk(input_df, filter(., a != 2), ~cat("x",.,"y\n"))

If you want to overwrite the default behavior you have to name the arguments, but I can only manage to make it work with base R, I don't know why

library(tidyverse)
input_df<- tibble(a = 1:3,
                  b = letters[1:3]) 

input_df %>% 
    walk(.x = .[.$a != 2,], .f = ~cat("x",.x,"y\n"))
#> x 1 3 y
#> x a c y

andresrcs · March 5, 2020, 3:31am

I found a syntax that works with the pipe inside the arguments

library(tidyverse)
input_df<- tibble(a = 1:3,
                  b = letters[1:3]) 

input_df %>% 
    walk(.x = (.) %>% filter(a != 2), .f = ~cat("x",.x,"y\n"))
#> x 1 3 y
#> x a c y

nirgrahamuk · March 5, 2020, 11:07am

andresrcs, thats really interesting, that () has that effect.
I got curious and just experimented with random things and actually I found this, which eliminates the . !!!

library(tidyverse)
input_df<- tibble(a = 1:3,
                  b = letters[1:3]) 

input_df %>% 
  walk({} %>% filter(a != 2), .f = ~cat("x",.x,"y\n"))

wacky huh ?

dromano · March 5, 2020, 11:49am

Thanks, @andresrcs and @nirgrahamuk! Very curious -- I'm guessing the mystery must be buried in rlang or something like that?

And thanks @joels for the walk() illustration -- very helpful! It made me realize that piping to walk() is like %T>%-piping to `map, which helps me understand what it does better:

library(tidyverse)
1:3 %T>% 
  map(
    .,
    ~ write_csv(tibble(.x), 'test.csv', append = TRUE)
  ) %>% 
  head()
#> [1] 1 2 3
  
1:3 %>% 
  walk(
    ., 
    ~ write_csv(tibble(.x), 'test.csv', append = TRUE)
  ) %>% 
  head()
#> [1] 1 2 3

^{Created on 2020-03-05 by the reprex package (v0.3.0)}

dromano · March 5, 2020, 2:10pm

Another related mystery: How can the (.) and {} tricks be replicated with .f, too?

system · March 12, 2020, 2:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.