Error using dbplyr 1.4 with row_number()

martin.R · May 1, 2019, 4:21pm

Thanks, I should have done that. I managed to find a server still running dbplyr 1.3.0, so the differences become obvious via show_query() which should hopefully be enough as explain() is a much longer output.

Here is a simplified form of the code with names changed:

library(dplyr)
library(dbplyr)

con <- DBI::dbConnect(odbc::odbc(), "my_DSN")

my_query <- 
  tbl(con, in_schema("container_name", "table_name")) %>% 
  select(id_column, datestamp) %>% 
  filter(datestamp == "2019-04-30") %>% 
  group_by(id_column) %>% 
  filter(row_number() == 1L) %>% 
  ungroup()

This removes duplicate id values. (There may be better ways of doing this, but this is a simplified query).

dplyr 1.3.0:

> my_query %>% show_query()
<SQL>
SELECT `id_column`, `datestamp`
FROM (SELECT `id_column`, `datestamp`, row_number() OVER (PARTITION BY `id_column`) AS `zzz9`
FROM (SELECT *
FROM (SELECT `id_column`, `datestamp`
FROM container_name.table_name) `sbbdxdpnnx`
WHERE (`datestamp` = '2019-04-30')) `qcpxocjxmq`) `akxsynfdxk`
WHERE (`zzz9` = 1)

dplyr 1.4.0:

> my_query %>% show_query()
<SQL>
SELECT *
FROM (SELECT *
FROM (SELECT `id_column`, `datestamp`
FROM container_name.table_name) `dbplyr_030`
WHERE (`datestamp` = '2019-04-30')) `dbplyr_031`
WHERE (row_number() = 1)

dbplyr 1.4.0 therefore omits the ... row_number() OVER (PARTITION BYid_column) windowing and errors when it comes to WHERE (row_number() = 1).

I tried to upload mtcars as a sample dataset to our database, but failed and our admin is on holiday. I hope that this example provides enough detail to explain the cause of the error.