I'm trying to understand tbl_lazy
and op_*
behaviour.
I expect that nrow() should return NA
for a simulated source, but we get NULL
library(dbplyr)
library(dplyr)
## example data
df <- tibble::tibble(apples = 1:3, oranges = c("a", "b", "c"))
## example SQLite db
db <- src_sqlite(tempfile(), create = TRUE)
## a real SQLite tbl
real_tbl <- copy_to(db, df)
## a simulated SQLite tbl
sim_tbl <- tbl_lazy(df, simulate_sqlite())
## NA, as expected
nrow(real_tbl)
## NULL, not expected
nrow(sim_tbl)
The problem comes with the print:
## this causes printing to fail in trunc_mat
print(sim_tbl)
#Error in if (is.na(rows) || rows > tibble_opt("print_max")) { :
# missing value where TRUE/FALSE needed
# In addition: Warning message:
# In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'
## though other ops do work
str(op_base(real_tbl, "apples"))
str(op_base(sim_tbl, "apples"))
My question is, should nrow(real_tbl)
return NA
? How does that work though, I'm confused about the ops_ control here.
Are there other examples that use lazy_ops but that don't use a real database? I'm trying to wrap the src/tbl/collect idioms to provide lazy exploration as a proof of concept.