I am trying to run some code that for sure worked last April-ish (2021). It does a summarize() without a group_by() using dbplyr. But it now gives an error. When I look at the SQL generated, it seems it automatically is looking for a grouping variable, but I don't want one - I just want an overall summary. Is there a way around this? Or should I just use SQL for this? Just trying to teach this to students, so it's not that big of a deal. Reprex is below.
library(tidyverse) # for reading in data, graphing, and cleaning
library(dbplyr) # for SQL query "cheating" - part of tidyverse but needs to be loaded separately
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
library(mdsr) # for accessing some databases - goes with Modern Data Science with R textbook
library(RMySQL) # for accessing MySQL databases
#> Loading required package: DBI
library(RSQLite) # for accessing SQLite databases
#>
#> Attaching package: 'RSQLite'
#> The following object is masked from 'package:RMySQL':
#>
#> isIdCurrent
con_air <- dbConnect_scidb("airlines")
overall_smry <-
tbl(con_air, "flights") %>%
summarize(min_year = min(year),
max_year = max(year),
num_flights = n(),
num_carriers = n_distinct(carrier))
overall_smry
#> Warning: Missing values are always removed in SQL.
#> Use `MIN(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> Warning: Missing values are always removed in SQL.
#> Use `MAX(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> Error in .local(conn, statement, ...): could not run statement: Unknown column '' in 'field list'
overall_smry %>%
show_query()
#> <SQL>
#> SELECT ``, MIN(`year`) AS `min_year`, MAX(`year`) AS `max_year`, COUNT(*) AS `num_flights`, COUNT(DISTINCT `carrier`) AS `num_carriers`
#> FROM `flights`
#> GROUP BY ``
Hi, can you share what it returns when you run class(con_air)? I think it has to do with the type of database dbplyr think it's working with. The output of that command should give us a clue of what dbplyr is seeing.
It works when I use DBI instead of the helper function from mdsr:
Thank you! I think for now I can just skip that tiny chunk of code since the workaround might be a bit much for my students, but thanks for your suggestion. Sorry I didn't realize the problem comes from mdsr - I just assumed it was from dbplyr. Thanks again!