I've been using dbplyr to build a complex query to access data in Snowflake. It works well but I notice it takes a long time to build the query, even before I collect(). It seems that at every step dbplyr needs to check with the database (Snowflake in this instance) things like column name, column types, and column contents (in the case of pivot_wider). Building the query is taking as long as actually executing it.
Are there any tips to reduce dbplyr's communication with the database prior to collect()? I noticed on Github someone said you can use pivot_wider_spec() to speed up pivot_wider() a bit, but this wouldn't save me much in this case as I only do pivot_wider() twice (although I do know the new column names in advance).
Any tips?
Edit: I do have several local tables used for joining tables in the database; I push them using copy_to(), I wonder if this makes it slow.
Edit2: One thing I am doing it expanding on time. I have tables that include a datetime (whole hours) and I want to expand the table to the full time series. I am creating the series of hours locally, using copy_to, and then left_joining the DB table. Is there a more efficient way?
Edit3: My bottleneck is this code:
weathnew4 <- weathnew3 %>%
pivot_wider(names_from = "NAME", values_from = "VALUEI")
pivot_wider() is not lazy so it executed the (complex and slow) query weathnew3.
I know the values in the NAME column, they are stored in a character vector called "variables". I tried using pivot_wider_spec() which should be lazy, but I got an error.
spec <- tibble(NAME = variables, .name = variables, .value = "VALUEI")
weathnew4 <- weathnew3 %>%
pivot_wider_spec(spec)
Error in `data[vec_unique(c(id_cols, names_from_cols, values_from_cols, unused_cols))]`:
! Can't subset columns with `vec_unique(c(id_cols, names_from_cols, values_from_cols, unused_cols))`.
✖ Subscript `vec_unique(c(id_cols, names_from_cols, values_from_cols, unused_cols))` can't contain missing values.
✖ It has a missing value at location 1.
Run `rlang::last_trace()` to see where the error occurred.