Packing dplyr operations into one object?

I would like to know where there is any built-in function for us to "pack" a sequence of dplyr operations into one object, such that the operations are reusable.

Suppose this is what I want to do:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

head(cars)
#>   speed dist
#> 1     4    2
#> 2     4   10
#> 3     7    4
#> 4     7   22
#> 5     8   16
#> 6     9   10
# Create a data set with same column names
n <- 100
set.seed(8701)
cars2 <- data.frame(speed = round(runif(n, 4, 50)),
                    dist = round(runif(n, 15, 100)))
head(cars2)
#>   speed dist
#> 1    33   23
#> 2    38   95
#> 3    16   59
#> 4    26   58
#> 5    19   32
#> 6    28   17

new1 <- cars %>% filter(speed > 10) %>%
                 mutate(dist2 = dist * 2)
new2 <- cars2 %>% filter(speed > 10) %>%
                  mutate(dist2 = dist * 2)
head(new1)
#>   speed dist dist2
#> 1    11   17    34
#> 2    11   28    56
#> 3    12   14    28
#> 4    12   20    40
#> 5    12   24    48
#> 6    12   28    56
head(new2)
#>   speed dist dist2
#> 1    33   23    46
#> 2    38   95   190
#> 3    16   59   118
#> 4    26   58   116
#> 5    19   32    64
#> 6    28   17    34

If I want to do filter(speed > 10) %>% mutate(dist2 = dist * 2) again and again, each time on a different data frame, can I do something like this?

to_do <-  some_function(filter(speed > 10) %>%
                        mutate(dist2 = dist * 2))
new1 <- cars %>% to_do
new2 <- cars2 %%> to_do

You are on the right track.

suppressMessages(library(dplyr))
my_function <- function(x) {
  x %>% filter(speed > 10) %>% mutate(dist2 = dist * 2)
}

cars0 <- cars[1:10,]
cars1 <- cars[11:20,]
cars2 <- cars[21:30,]
cars3 <- cars[31:40,]
cars4 <- cars[41:50,]
my_function(cars0)
#>   speed dist dist2
#> 1    11   17    34
my_function(cars1)
#>    speed dist dist2
#> 1     11   28    56
#> 2     12   14    28
#> 3     12   20    40
#> 4     12   24    48
#> 5     12   28    56
#> 6     13   26    52
#> 7     13   34    68
#> 8     13   34    68
#> 9     13   46    92
#> 10    14   26    52
my_function(cars2)
#>    speed dist dist2
#> 1     14   36    72
#> 2     14   60   120
#> 3     14   80   160
#> 4     15   20    40
#> 5     15   26    52
#> 6     15   54   108
#> 7     16   32    64
#> 8     16   40    80
#> 9     17   32    64
#> 10    17   40    80
my_function(cars3)
#>    speed dist dist2
#> 1     17   50   100
#> 2     18   42    84
#> 3     18   56   112
#> 4     18   76   152
#> 5     18   84   168
#> 6     19   36    72
#> 7     19   46    92
#> 8     19   68   136
#> 9     20   32    64
#> 10    20   48    96
my_function(cars4)
#>    speed dist dist2
#> 1     20   52   104
#> 2     20   56   112
#> 3     20   64   128
#> 4     22   66   132
#> 5     23   54   108
#> 6     24   70   140
#> 7     24   92   184
#> 8     24   93   186
#> 9     24  120   240
#> 10    25   85   170
my_function(cars)
#>    speed dist dist2
#> 1     11   17    34
#> 2     11   28    56
#> 3     12   14    28
#> 4     12   20    40
#> 5     12   24    48
#> 6     12   28    56
#> 7     13   26    52
#> 8     13   34    68
#> 9     13   34    68
#> 10    13   46    92
#> 11    14   26    52
#> 12    14   36    72
#> 13    14   60   120
#> 14    14   80   160
#> 15    15   20    40
#> 16    15   26    52
#> 17    15   54   108
#> 18    16   32    64
#> 19    16   40    80
#> 20    17   32    64
#> 21    17   40    80
#> 22    17   50   100
#> 23    18   42    84
#> 24    18   56   112
#> 25    18   76   152
#> 26    18   84   168
#> 27    19   36    72
#> 28    19   46    92
#> 29    19   68   136
#> 30    20   32    64
#> 31    20   48    96
#> 32    20   52   104
#> 33    20   56   112
#> 34    20   64   128
#> 35    22   66   132
#> 36    23   54   108
#> 37    24   70   140
#> 38    24   92   184
#> 39    24   93   186
#> 40    24  120   240
#> 41    25   85   170

Created on 2023-09-17 with reprex v2.0.2

Thanks!

Is there a built-in function that can do this without writing a function ourselves?

This idea came to me when working on ggplot2. The + operator is very convenient when working with themes and similar elements (though not for some others). We can "add" several calls together, and add them to different ggplot2 graphs. We can also combine them in anyway we want, to modify the themes.

I am wondering whether we can do something similar in dplyr, somehow adding operations together.

I quickly drafted a function to illustrate what I want to have. It definitely is not flexible enough and may fail in some cases but I think it is enough to demonstrate what I would love to see in dplyr ... or maybe there is already such a function somewhere?

pack_dplyr <- function(...) {
    args <- match.call()
    tmpfct <- function(.data) {
        k <- length(args)
        data_new <- .data
        for (x in seq(2, k)) {
            callx <- args[[x]]
            callx$.data <- data_new
            data_new <- eval(callx)
          }
        data_new
      }
    tmpfct
  }

cars1 <- cars[1:10, ]
# Create a version with different column orders
cars2 <- data.frame(id = round(runif(20, 10, 20)),
                    dist = cars[11:20, "dist"],
                    speed = cars[11:20, "speed"])
head(cars1)
#>   speed dist
#> 1     4    2
#> 2     4   10
#> 3     7    4
#> 4     7   22
#> 5     8   16
#> 6     9   10
head(cars2)
#>   id dist speed
#> 1 11   28    11
#> 2 17   14    12
#> 3 14   20    12
#> 4 18   24    12
#> 5 16   28    12
#> 6 17   26    13

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tmp1 <- pack_dplyr(filter(speed > 4),
                   mutate(dist2 = dist * 2),
                   select(-dist))

new1 <- cars1 %>% tmp1
new2 <- cars2 %>% tmp1

head(new1)
#>   speed dist2
#> 1     7     8
#> 2     7    44
#> 3     8    32
#> 4     9    20
#> 5    10    36
#> 6    10    52
head(new2)
#>   id speed dist2
#> 1 11    11    56
#> 2 17    12    28
#> 3 14    12    40
#> 4 18    12    48
#> 5 16    12    56
#> 6 17    13    52

# More oprations

new11 <- cars1 %>% tmp1 %>% slice(1:5)
new22 <- cars2 %>% rename(new_id = id) %>% tmp1

head(new11)
#>   speed dist2
#> 1     7     8
#> 2     7    44
#> 3     8    32
#> 4     9    20
#> 5    10    36
head(new22)
#>   new_id speed dist2
#> 1     11    11    56
#> 2     17    12    28
#> 3     14    12    40
#> 4     18    12    48
#> 5     16    12    56
#> 6     17    13    52

# Two packaged operations

tmpa1 <- pack_dplyr(filter(speed > 4),
                    mutate(dist2 = dist * 2))
tmpa2 <- pack_dplyr(slice(1:5),
                    select(-dist))

newa1 <- cars1 %>% tmpa1 %>% tmpa2
head(newa1)
#>   speed dist2
#> 1     7     8
#> 2     7    44
#> 3     8    32
#> 4     9    20
#> 5    10    36

The + operator in ggplot2 works there because the package is somewhat of a hidden domain specific language and it uses the + operator to add new objects, ggprotos to a ggplot object. And, although it can substitute new mappings of data and aes arguments to override the original, it's not really analogous to applying different datasets to dplyr verbs.

I am unaware of any tidyverse function to do this, aside from yours.

1 Like

Oh ... just found that we can already do this. The last example of %>% in its help page at magrittr shows exactly what I want.

cars1 <- cars[1:10, ]
# Create a version with different column orders
cars2 <- data.frame(id = round(runif(20, 10, 20)),
                    dist = cars[11:20, "dist"],
                    speed = cars[11:20, "speed"])
head(cars1)
#>   speed dist
#> 1     4    2
#> 2     4   10
#> 3     7    4
#> 4     7   22
#> 5     8   16
#> 6     9   10
head(cars2)
#>   id dist speed
#> 1 15   28    11
#> 2 14   14    12
#> 3 11   20    12
#> 4 19   24    12
#> 5 10   28    12
#> 6 16   26    13

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tmp1_dot <- . %>% filter(speed > 4) %>%
                  mutate(dist2 = dist * 2) %>%
                  select(-dist)
new1 <- cars1 %>% tmp1_dot
new2 <- cars2 %>% tmp1_dot

head(new1)
#>   speed dist2
#> 1     7     8
#> 2     7    44
#> 3     8    32
#> 4     9    20
#> 5    10    36
#> 6    10    52
head(new2)
#>   id speed dist2
#> 1 15    11    56
#> 2 14    12    28
#> 3 11    12    40
#> 4 19    12    48
#> 5 10    12    56
#> 6 16    13    52
2 Likes

Thanks a lot for your explanation! I understand more about how ggplot2's + operator works now.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.