How can I use map*() to eliminate repeated calls to mutate()?

dougfir · May 28, 2024, 2:44am

I would like to mutate new columns onto a dataframe within a pipeline of operations using the native pipe. Example:

bla <- 1:3
df <- data.frame(x = 1:3) |> 
  mutate(bla_1 = x + bla[1],
         bla_2 = x + bla[2],
         bla_3 = x + bla[3])

But rather than write out each line of mutate, I'm seeking a way to do this more elagently by mapping over bla.

I attempted using across() with map_dfc but could not get this working. How can I loop over bla in a tidyverse esque way to mutate new columns per this example?

dromano · May 28, 2024, 3:14am

Is it important that you preserve the column name x?

dougfir · May 28, 2024, 3:18am

It could be replaced with a contant 'logins_'

FJCC · May 28, 2024, 3:32am

Here is an inelegant solution.

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.3.3
bla <- 1:3
df <- data.frame(x = 1:3) 

df2 <- map2(bla, df$x, \(vec, DAT ) as.data.frame(DAT + vec)) |> list_cbind()
#> New names:
#> • `DAT + vec` -> `DAT + vec...1`
#> • `DAT + vec` -> `DAT + vec...2`
#> • `DAT + vec` -> `DAT + vec...3`
colnames(df2) <- paste("bla",1:3, sep = "_")
cbind(df, df2)
#>   x bla_1 bla_2 bla_3
#> 1 1     2     4     6
#> 2 2     2     4     6
#> 3 3     2     4     6

^{Created on 2024-05-27 with reprex v2.0.2}

dromano · May 28, 2024, 12:20pm

Here are a couple more that use across(), one a bit hacky since it depends on across() automatically incorporating indices into column names:

library(tidyverse)
bla <- 1:3
data.frame(x = 1:3) |> 
  mutate(
    across(
      x, 
      # create list of 'mutate()' functions
      bla |> 
        map(
          \(n) {
            \(col) col + n
          } 
        )
    )
  ) |> 
  rename_with(
    \(name) str_replace(name, 'x', 'bla'),
    contains('x_')
  )
#>   x bla_1 bla_2 bla_3
#> 1 1     2     3     4
#> 2 2     3     4     5
#> 3 3     4     5     6

^{Created on 2024-05-28 with reprex v2.0.2}
and the other more cumbersome but with more direct control over column names:

library(tidyverse)
bla <- 1:3
data.frame(x = 1:3) |> 
  mutate(
    across(
      x, 
      (\(dummy) {
        # create vector of names for 'mutate()` functions`
        nms <- 
          bla |> 
          map_chr(
            \(n) str_c('bla', n, sep = '_')
          )
        # create  list of 'mutate()' functions
        fns <- 
          bla |> 
          map(
            \(n) {
              \(col) col + n
            }
          )
        # add names to list of 'mutate()' functions
        names(fns) <- nms
        fns
      })()
    )
  ) |> 
  rename_with(
    \(name) str_remove(name, 'x_'),
    contains('x_')
  )
#>   x bla_1 bla_2 bla_3
#> 1 1     2     3     4
#> 2 2     3     4     5
#> 3 3     4     5     6

^{Created on 2024-05-28 with reprex v2.0.2}

nirgrahamuk · May 28, 2024, 1:30pm

I have encountered a good few R programmers, that dont like the following approach, but in this context, I dont see any real problem with using it.

library(tidyverse)
library(glue)
library(rlang)
bla <- 1:3

 
(nms <- map_chr(bla,\(x)glue("bla_{x}")))
(vls <- map_chr(bla,\(x)glue("x + {x}")))

names(vls) <- nms

(df <- data.frame(x = 1:3) |> 
  mutate(!!!parse_exprs(vls)))

dromano · May 28, 2024, 6:02pm

Here's a hybrid of my two earlier solutions, which has raised a question that I'll post shortly:

library(tidyverse)
bla <- 1:3
data.frame(x = 1:3) |> 
  mutate(
    across(
      x, 
      bla |> 
        map(
          \(n) {
            # create name of 'mutate()' function
            nm <- str_c('bla', n, sep = '_')
            # create list that contains 'mutate()' function
            lf <- list(\(col) col + n)
            # add name to list
            names(lf) <- nm
            # return named list (with single element)
            lf
          } 
        ) |> 
        # undo one level of list to obtain a named list of 'mutate()' functions
        unlist(),
      # use only function names to create column names
      .names = '{.fn}'
    )
  )

#>   x bla_1 bla_2 bla_3
#> 1 1     2     3     4
#> 2 2     3     4     5
#> 3 3     4     5     6

^{Created on 2024-05-28 with reprex v2.0.2}

joels · May 28, 2024, 10:21pm

Do either of the options below do what you're looking for?

library(tidyverse)
library(glue)

d = data.frame(x = 1:3)

bla = 1:3

bla %>% 
  set_names() %>% 
  imap(~ d %>% mutate(!!glue("bla_{.y}") := x + .x)) %>% 
  reduce(left_join)
#> Joining with `by = join_by(x)`
#> Joining with `by = join_by(x)`
#>   x bla_1 bla_2 bla_3
#> 1 1     2     3     4
#> 2 2     3     4     5
#> 3 3     4     5     6


map_dfr(bla, ~ d %>% 
           mutate(val = .x, 
                  bla = x + .x)) %>% 
  pivot_wider(names_from=val, values_from=bla, names_prefix="bla_")
#> # A tibble: 3 × 4
#>       x bla_1 bla_2 bla_3
#>   <int> <int> <int> <int>
#> 1     1     2     3     4
#> 2     2     3     4     5
#> 3     3     4     5     6

This might not generalize well, depending on your application, but you can use also use outer in this case:

d %>% mutate(bla = outer(x, bla, "+"))

  x bla.1 bla.2 bla.3
1 1     2     3     4
2 2     3     4     5
3 3     4     5     6

Or, if you want the column names to match the values in bla:

bla = 11:15
d %>% mutate(bla = {a=outer(x, bla, "+"); colnames(a)=bla; a})

  x bla.11 bla.12 bla.13 bla.14 bla.15
1 1     12     13     14     15     16
2 2     13     14     15     16     17
3 3     14     15     16     17     18

dougfir · May 29, 2024, 1:03am

I really like the readability of these solutions, for me they just read smoother which is what I love about tidyverse

dougfir · May 29, 2024, 1:11am

d = data.frame(x = 1:3)
bla = 1:3
d |> mutate(bla = outer(x, bla, "+"))

Beautiful. I read ?outer:

Outer Product of Arrays
Description
The outer product of the arrays X and Y is the array A with dimension c(dim(X), dim(Y)) where element A[c(arrayindex.x, arrayindex.y)] = FUN(X[arrayindex.x], Y[arrayindex.y], ...).

Not following. I can see what it's doing in my r console, but I can't put into words what outer is doing here. It's using vectorization to iterate on each corresponding col of d and bla?

dromano · May 29, 2024, 2:09am

Here's a tidyverse version of what outer() is doing, which is, calculating all possible products (default) or sums:

library(tidyverse)
# alternative to outer(1:3, 4:5, '+')
expand_grid(a = 1:3, b = 4:5) |> 
  mutate(c = a + b) |> 
  pivot_wider(names_from = b, values_from = c)
#> # A tibble: 3 × 3
#>       a   `4`   `5`
#>   <int> <int> <int>
#> 1     1     5     6
#> 2     2     6     7
#> 3     3     7     8

# alternative to outer(1:3, 1:3, '+')
expand_grid(a = 1:3, b = 1:3) |> 
  mutate(c = a + b) |> 
  pivot_wider(names_from = b, values_from = c)
#> # A tibble: 3 × 4
#>       a   `1`   `2`   `3`
#>   <int> <int> <int> <int>
#> 1     1     2     3     4
#> 2     2     3     4     5
#> 3     3     4     5     6

^{Created on 2024-05-28 with reprex v2.0.2}

joels · May 29, 2024, 3:50am

The following examples show what outer is doing:

# Show the positions of the elements of the output matrix relative to the 
#  input vectors
A <- paste0("a", 1:3)
B <- paste0("b", 1:4)
outer(A, B, "paste", sep = " ")
#>      [,1]    [,2]    [,3]    [,4]   
#> [1,] "a1 b1" "a1 b2" "a1 b3" "a1 b4"
#> [2,] "a2 b1" "a2 b2" "a2 b3" "a2 b4"
#> [3,] "a3 b1" "a3 b2" "a3 b3" "a3 b4"

# Can apply arbitrary functions of two variables (as also shown above)
C <- 1:3
D <- 1:4
outer(C, D, \(x, y) cos(x) * sin(y))
#>            [,1]       [,2]        [,3]       [,4]
#> [1,]  0.4546487  0.4912955  0.07624747 -0.4089021
#> [2,] -0.3501755 -0.3784012 -0.05872664  0.3149410
#> [3,] -0.8330500 -0.9001976 -0.13970775  0.7492288

# Dimension of outer product is sum of dimensions of inputs
# 1D (vector) and 2D (matrix) produce 3D array
E <- paste0("e", 1:3)
F <- matrix(paste0("f", 1:8), nrow = 4)
outer(E, F, "paste")
#> , , 1
#> 
#>      [,1]    [,2]    [,3]    [,4]   
#> [1,] "e1 f1" "e1 f2" "e1 f3" "e1 f4"
#> [2,] "e2 f1" "e2 f2" "e2 f3" "e2 f4"
#> [3,] "e3 f1" "e3 f2" "e3 f3" "e3 f4"
#> 
#> , , 2
#> 
#>      [,1]    [,2]    [,3]    [,4]   
#> [1,] "e1 f5" "e1 f6" "e1 f7" "e1 f8"
#> [2,] "e2 f5" "e2 f6" "e2 f7" "e2 f8"
#> [3,] "e3 f5" "e3 f6" "e3 f7" "e3 f8"

^{Created on 2024-05-28 with reprex v2.1.0}

dromano · May 29, 2024, 1:37pm

This seems to me to most naturally reflect the structure of the task — the use of reduce() to address the repeated application of mutate(). Very nice.

I tried to see if I could use similar !! syntax to insert names into the list elements collected by map(), but haven't been able to figure out how to do that.

dromano · May 29, 2024, 2:30pm

And here is a more streamlined version of my previous solution with across(), informed by the use of set_names() and glue() by @joels and @nirgrahamuk:

library(tidyverse)
library(glue)
bla <- 1:3
data.frame(x = 1:3) |> 
  mutate(
    across(
      x, 
      bla |> 
        set_names(\(n) glue('bla_{n}')) |> 
        map(\(n) \(col) col + n),
      .names = '{.fn}'
    )
  )
#>   x bla_1 bla_2 bla_3
#> 1 1     2     3     4
#> 2 2     3     4     5
#> 3 3     4     5     6

^{Created on 2024-05-29 with reprex v2.0.2}

system · June 5, 2024, 2:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.