Why is the native pipe `|>` not a function?

It seems like literally every operator in R is a function ... except for the new native pipe |>

Why is this the case and what exactly is |>?
Does it have a definition somewhere that I can read?

library(magrittr)
1:2 |> sum()
#> [1] 3

is.function(`%>%`)
#> [1] TRUE
is.function(`<-`)
#> [1] TRUE
is.function(`+`)
#> [1] TRUE
is.function(`if`)
#> [1] TRUE
is.function(`|>`)
#> Error in eval(expr, envir, enclos): object '|>' not found

Created on 2023-07-11 with reprex v2.0.2

Despite the Official Received Wisdom

Everything in R is a function

syntax and its sugar, like the native pipe seems to get a past. Where we once wrote f(lhe,rhe) we can now write lhe |> rhe. (left hand expression, right hand expression). The pipe isn't unique in being a non-function. Check out everything on the {base}help page down at the bottom in the link to{misc}`.

2 Likes

The help page ?`|>` contains some more details:

Currently, pipe operations are implemented as syntax transformations. So an expression written as x |> f(y) is parsed as f(x, y) .

And one of the examples illustrates it directly:

quote(mtcars |> subset(cyl == 4) |> nrow())
#> nrow(subset(mtcars, cyl == 4))

Created on 2023-07-12 with reprex v2.0.2
So this replacement happens before the actual processing of the code.

Another example:

is.function(`<-`)
#> [1] TRUE
is.function(`->`)
#> Error in eval(expr, envir, enclos): object '->' not found
quote( 5 -> a )
#> a <- 5

Created on 2023-07-12 with reprex v2.0.2

As to where to find its code definition, I'm not fully sure, as this is heavy C and Bison code, but I believe it's this function xxpipe defined here, simplified code:

static SEXP xxpipe(SEXP lhs, SEXP rhs, YYLTYPE *lloc_rhs)
{
    SEXP ans;

    SEXP fun = CAR(rhs);
    SEXP args = CDR(rhs);

    PRESERVE_SV(ans = lcons(fun, lcons(lhs, args)));
    return ans;
}

Where I think lcons() is to create a language construct, using the content of lhs as argument to rhs.

@technocrat

I don't understand this sentence, is a link missing?

2 Likes

You did well to find that xxpipe definition, I peeked at that commit history and there was this gem :

The pipe implementation as a syntax transformation was motivated by
suggestions from Jim Hester and Lionel Henry

1 Like

Thank you for the explanation! Here is why I am asking. I'm experimenting with a way to extend dbplyr sql translations. In particular here is an example of adding a dateadd function that translates to the correct sql on different database systems. This is a minimal example of what I'm thinking just to get the idea across. There are issues with it though (i.e. it only works with the Magrittr pipe).

Anyway if you have an idea of how to do this I would very much appreciate any suggestions.

Thanks!

dateadd <- function(date, number, interval = "day") {

  dot <- get(".", envir = parent.frame())
  
  sql <- switch (class(dot$src$con)[1],
                 "duckdb_connection" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "redshift" = glue::glue("DATEADD({interval}, {number}, {date})"),
                 "oracle" = glue::glue("({date} + NUMTODSINTERVAL({number}, 'day'))"),
                 "postgresql" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "sql server" = glue::glue("DATEADD({interval}, {number}, {date})"),
                 "spark" = glue::glue("date_add({date}, {number})"),
                 "sqlite" = glue::glue("CAST(STRFTIME('%s', DATETIME({date}, 'unixepoch', ({number})||' {interval}s')) AS REAL)"),
                 "bigquery" = glue::glue("DATE_ADD({date}, INTERVAL {number} {toupper(interval)})"),
                 "snowflake" = glue::glue('DATEADD({interval}, {number}, {date})'),
                 stop(glue::glue("Connection type {class(dot$src$con)[1]} is not supported!"))
  )
  dbplyr::sql(as.character(sql))
}

con <- DBI::dbConnect(duckdb::duckdb())

date_tbl <- dplyr::copy_to(con, data.frame(date1 = as.Date("1999-01-01")),
                           name = "tmpdate", overwrite = TRUE, temporary = TRUE)

library(magrittr)
# works
date_tbl %>%  
 dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) 
#> # Source:   SQL [1 x 2]
#> # Database: DuckDB 0.8.2-dev77 [root@Darwin 21.6.0:R 4.2.2/:memory:]
#>   date1      date2     
#>   <date>     <date>    
#> 1 1999-01-01 2000-01-01

# fails
dplyr::mutate(date_tbl, date2 = !!dateadd("date1", 1, interval = "year")) 
#> Error in get(".", envir = parent.frame()): object '.' not found

# fails
date_tbl |>  
  dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) 
#> Error in get(".", envir = parent.frame()): object '.' not found

DBI::dbDisconnect(con, shutdown = TRUE)

Created on 2023-07-12 with reprex v2.0.2

There is almost certainly a better way to do this, but what I write here at least 'works'

dateadd <- function(date, number, interval = "day") {
  
  dot <- try(get(".", envir = parent.frame()),silent = TRUE)
  if(inherits(dot, "try-error")){
    con_object <- try(get("con", envir = parent.frame()))
  } else {
    con_object <- dot$src$con
  }
  sql <- switch (class(con_object)[1],
                 "duckdb_connection" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "redshift" = glue::glue("DATEADD({interval}, {number}, {date})"),
                 "oracle" = glue::glue("({date} + NUMTODSINTERVAL({number}, 'day'))"),
                 "postgresql" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "sql server" = glue::glue("DATEADD({interval}, {number}, {date})"),
                 "spark" = glue::glue("date_add({date}, {number})"),
                 "sqlite" = glue::glue("CAST(STRFTIME('%s', DATETIME({date}, 'unixepoch', ({number})||' {interval}s')) AS REAL)"),
                 "bigquery" = glue::glue("DATE_ADD({date}, INTERVAL {number} {toupper(interval)})"),
                 "snowflake" = glue::glue('DATEADD({interval}, {number}, {date})'),
                 stop(glue::glue("Connection type {class(con_object)[1]} is not supported!"))
  )
  dbplyr::sql(as.character(sql))
}

I think this will only work if the connection object is named con. I generally won't know the name of the user's connection object will be though. If it is conn then it fails.

# Function that will be exported from a package ----
dateadd <- function(date, number, interval = "day") {

  dot <- try(get(".", envir = parent.frame()), silent = TRUE)
  
  if (inherits(dot, "try-error")) {
    con_object <- try(get("con", envir = parent.frame()))
  } else {
    con_object <- dot$src$con
  }
  
  sql <- switch (class(con_object)[1],
                 "duckdb_connection" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "postgresql" = glue::glue("({date} + {number}*INTERVAL'1 {interval}')"),
                 "redshift" = glue::glue("DATEADD({interval}, {number}, {date})"),
                 stop(glue::glue("Connection type {class(dot$src$con)[1]} is not supported!"))
  )
  dbplyr::sql(as.character(sql))
}

# User's code ----
conn <- DBI::dbConnect(duckdb::duckdb())

date_tbl <- dplyr::copy_to(conn, data.frame(date1 = as.Date("1999-01-01")),
                           name = "tmpdate", overwrite = TRUE, temporary = TRUE)

library(magrittr)

date_tbl %>%  
 dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) 
#> # Source:   SQL [1 x 2]
#> # Database: DuckDB 0.8.2-dev77 [root@Darwin 21.6.0:R 4.2.2/:memory:]
#>   date1      date2     
#>   <date>     <date>    
#> 1 1999-01-01 2000-01-01

# fails
dplyr::mutate(date_tbl, date2 = !!dateadd("date1", 1, interval = "year")) 
#> Error in get("con", envir = parent.frame()) : object 'con' not found
#> Error in dot$src: $ operator is invalid for atomic vectors

# fails
date_tbl |>  
  dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year")) 
#> Error in get("con", envir = parent.frame()) : object 'con' not found
#> Error in dot$src: $ operator is invalid for atomic vectors

DBI::dbDisconnect(conn, shutdown = TRUE)

Created on 2023-07-12 with reprex v2.0.2

Good point. I thought the purpose of dbplyr was for you to run dplyr code on connections, i would investigate how this is normally done. If i have time to dig around on this i will out of curiosity.

Sorry to be obscure. It’s a reference, rather than a link. I meant the help page for base has a misc section at the very bottom that leads to the large where all the stuff named with symbols is to be found

I'm curious, if I just type ?base I get a very short description with no {misc} section. Which help page are you referring to?

With this dot that can indeed only work if using a magrittr pipe (that defines .). Without magrittr, I tried playing a bit with parent.frame() but I don't think mutate() allow us to access its .data argument.

Also, the idea of getting the value from the parent frame feels a bit... shoddy. Another solution, still far from ideal, would be to pass the connection as an argument:

date_tbl |> 
  dplyr::mutate(date2 = !!dateadd("date1", 1, interval = "year", con = date_tbl)) 

It's a bit unwieldy but explicit. Short of rewriting dbpyr internals I can't really think of a better way.

Go to the packages tab and click on the base package. Scroll just pass the -- Z -- section to -- misc --.

1 Like

I see, thank you. Although I would say the vast majority of them are still functions.

Sorry again. Go to the index part?

just following up on this; I dug around a little, and it seems, only hacky approaches will be able to be offered just as long as there is not a proper extensibility approach for dbplyr ; for reference see

one could make the parent.frame checking approach more elaborate, for example, you can get the names of all the listed objects in the parent.frame, find the first one that is a tbl_lazy and get a connetion from that

library(dbplyr)
library(dplyr)
myfunc <- function(){
  nms <-ls(envir = parent.frame(),all.names = TRUE)
  is_tbl_lazy <- lapply(nms, \(x)inherits(get(x,envir=parent.frame()),what = "tbl_lazy"))
  names(is_tbl_lazy) <- nms
  get_tbl <- get(names(head(which(is_tbl_lazy==TRUE),n=1)),envir=parent.frame())
  class(dbplyr::remote_con(get_tbl))[[1]]
}
conn <- DBI::dbConnect(duckdb::duckdb())

date_tbl <- dplyr::copy_to(conn, data.frame(date1 = as.Date("1999-01-01")),
                           name = "tmpdate", overwrite = TRUE, temporary = TRUE)

date_tbl |>
  mutate(what_connection=!!myfunc())
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.