When using the dbplyr package to talk to a sqlite database found here:
It mentions that "All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data."
So within the example that they gave available :
# lazily generates query
summary <- mtcars2 %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)) %>%
arrange(desc(mpg))
# see query
summary %>% show_query()
#> <SQL>
#> SELECT `cyl`, AVG(`mpg`) AS `mpg`
#> FROM `mtcars`
#> GROUP BY `cyl`
#> ORDER BY `mpg` DESC
# execute query and retrieve results
summary %>% collect()
#> # A tibble: 3 x 2
#> cyl mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
I understand that the process is 1) summary object is the query() and 2) is only executed when you call the collect()
function because it works on lazy execution.
what i would like to know is - is it possible to not write a string of queries and then collect them at the end for example?
# lazily generates query
summary <- mtcars2 %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)) %>%
arrange(desc(mpg))
summary_2 <- summary %>% nrow()
# execute query and retrieve results
summary_2 %>% collect()
Is this possible?
Or is the only option to write code - collect the code - send that table back to the database - perform another dplyr query - collect that and write that back to the database ?
Is the above the only process?