The map family is super powerful, but I still find myself getting turned around by it—especially when you start using it in pipes.
I think, before we look at your examples, it's worth contrasting them with a simpler one: just making a new column.
toy_df0 <- toy_df %>% mutate(new_var = toy_func())
# Error in mutate_impl(.data, dots) : Column `new_var` must be length 2 (the number of rows) or one, not 3
toy_df0 <- toy_df %>% mutate(new_var = toy_func(2))
toy_df0
# A tibble: 2 x 2
# runs new_var
# <int> <dbl>
# 1 1 -1.21
# 2 2 -1.30
toy_df0 <- toy_df %>% mutate(new_var = toy_func(1))
toy_df0
# A tibble: 2 x 2
# runs new_var
# <int> <dbl>
# 1 1 -1.68
# 2 2 -1.68
In these examples, toy_func runs once each. The first time, it runs with the default argument, k = 3, and that throws an error because it doesn't fit in the existing data frame. With k = 2 it fits perfectly. With k = 1 the vector is "recycled", being concatenated with itself until it fits.
Part of the tricky part of using it in pipes is recognising the context in which the pipe operator works. When you're using map, you're running toy_func several times according to the first argument, .x, and that's missing in the first example (that's what the error means). The pipe is passing toy_df as the first argument to mutate, but not to map. So what you're really running is:
toy_df1 <- mutate(toy_df, new_var = map(.f = toy_func))
The next two examples are identical, and are maybe what you intended to express:
toy_df1 <- mutate(toy_df, new_var = map(toy_df, .f = toy_func))
toy_df1 <- toy_df %>% mutate(new_var = map(., .f = toy_func))
In the second version, the pipe passes toy_df to mutate, but you can then use it again with ., as I have in map.
If we look at the output here, we can see that it's definitely different to my examples:
toy_df1
# A tibble: 2 x 2
# runs new_var
# <int> <list>
# 1 1 <dbl [2]>
# 2 2 <dbl [2]>
toy_df1$new_var
# [[1]]
# [1] -0.02948677 -0.20796988
# [[2]]
# [1] -0.02948677 -0.20796988
It's nested: each call to toy_func produces a vector, and each vector becomes one element in the list column. This is different from vector recycling.
The question is, why is each vector of length 2 and not the default, 3?
When map is called, its first argument, .x, gets broken up element-by-element and toy_func is called on each element. In the previous examples where you passed toy_df all the way into map using the .. pronoun, the . argument is what gets broken up. Under the hood, data frames are really lists (with each column being a list element), so when you do one of these...
toy_df1 <- mutate(toy_df, new_var = map(toy_df, .f = toy_func))
toy_df1 <- toy_df %>% mutate(new_var = map(., .f = toy_func))
… you're actually kind of doing this:
toy_func(toy_df[[1]])
# [1] 1 2
# [1] -0.5275058 0.0256864
# if toy_df had more columns, it'd then be:
# toy_func(toy_df[[2]])
# toy_func(toy_df[[3]])
# etc.
Since the data frame has two rows, you're passing a two element vector to rnorm each time. And, unfortunately, rnorm is perhaps a little happy to make do with that. From the rnorm documentation:
n: number of observations. If length(n) > 1 , the length is taken to be the number required.
So by passing the data frame's columns onto your toy_func, you're ending up overriding the default argument—not with a constant k, but with a vector whose length is taken to be k by nrorm.
I'm wondering if you wanted to pass the value in each row of the runs column in as k. So for the first row, runs is 1 and you get rnorm(1) (one random number); for the second, rnorm(2) (two random numbers), etc. And then you unnest that. Is that fair?
If that's case, what you want to do is have map break the runs columns up element-by-element and give that to each toy_func run, not the whole data frame column-by-column. This would do the trick:
toy_df2 = toy_df %>% mutate(new_var = map(.$runs, .f = toy_func))
toy_df2
# A tibble: 2 x 2
# runs new_var
# <int> <list>
# 1 1 <dbl [1]>
# 2 2 <dbl [2]>
toy_df2 %>% unnest()
# A tibble: 3 x 2
# runs new_var
# <int> <dbl>
# 1 1 -0.887
# 2 2 1.80
# 3 2 1.08
The magic here is .$runs. The pipe passes toy_df to mutate, and then you recall it in map with .—but you use it with the dollar signto narrow it down to one column.
I think you have the right idea about default arguments, but the mechanics of map combined with the mechanics of the pipe make things complicated really quickly
I hope that helps!