How to create a list column of codes using dbplyr?

Hi,
I have a method to create a list column of a categorical variable grouped by an id variable using dplyr. However this does not work in dbplyr. Does anyone know of a way to do this in dbplyr??
Thanks
David

library(tidyverse)
library(dplyr)
library(nycflights13)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, flights)
flights2 <- tbl(con, "flights")

flights |> 
  head(100) |> 
  select(carrier, flight) |> 
  nest_by(carrier)
#> # A tibble: 11 × 2
#> # Rowwise:  carrier
#>    carrier               data
#>    <chr>   <list<tibble[,1]>>
#>  1 AA                [17 × 1]
#>  2 AS                 [1 × 1]
#>  3 B6                [25 × 1]
#>  4 DL                [13 × 1]
#>  5 EV                 [3 × 1]
#>  6 FL                 [1 × 1]
#>  7 MQ                 [6 × 1]
#>  8 UA                [26 × 1]
#>  9 US                 [5 × 1]
#> 10 VX                 [2 × 1]
#> 11 WN                 [1 × 1]

flights2 |> 
  head(100) |> 
  select(carrier, flight) |> 
  nest_by(carrier)
#> Error in UseMethod("nest_by"): no applicable method for 'nest_by' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

Created on 2023-07-03 with reprex v2.0.2

Short answer: the documentation is misleading—it says that nest_by will work a lazy data frame in the arguments section, but takes it back in methods

The following methods are currently available in loaded packages: dplyr (data.frame , grouped_df )

which sort of makes sense if you remember that the mission of {dplyr} is to translate tidy-style data statements into SQL queries. Once you have results from the con, continue by converting to a data frame.

library(dbplyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:dbplyr':
#> 
#>     ident, sql
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(nycflights13)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, flights)
flights2 <- tbl(con, "flights")
as.data.frame(flights2) |> 
  head(100) |> 
  select(carrier, flight) |> 
  nest_by(carrier)
#> # A tibble: 11 × 2
#> # Rowwise:  carrier
#>    carrier               data
#>    <chr>   <list<tibble[,1]>>
#>  1 AA                [17 × 1]
#>  2 AS                 [1 × 1]
#>  3 B6                [25 × 1]
#>  4 DL                [13 × 1]
#>  5 EV                 [3 × 1]
#>  6 FL                 [1 × 1]
#>  7 MQ                 [6 × 1]
#>  8 UA                [26 × 1]
#>  9 US                 [5 × 1]
#> 10 VX                 [2 × 1]
#> 11 WN                 [1 × 1]

Created on 2023-07-03 with reprex v2.0.2

1 Like

Thanks @technocrat

as.data.frame collects the data into your local environment?

I was hoping to do the nest_by query before collecting, as the data is really massive - and this step summarises it - is this possible??

nesting will not reduce your data, its a different structure, but contains the same info as if not nested.

2 Likes

That should probably be done within SQL using the con object. I'll see if my SQL skills are still up to it.

Guess it's like riding a bike

library(DBI)
library(nycflights13)
library(RSQLite)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

dplyr::copy_to(con, flights)
query <- "
SELECT carrier, COUNT(*) AS total_flights
FROM flights
GROUP BY carrier;
"

result <- DBI::dbGetQuery(con, query)
result
#>    carrier total_flights
#> 1       9E         18460
#> 2       AA         32729
#> 3       AS           714
#> 4       B6         54635
#> 5       DL         48110
#> 6       EV         54173
#> 7       F9           685
#> 8       FL          3260
#> 9       HA           342
#> 10      MQ         26397
#> 11      OO            32
#> 12      UA         58665
#> 13      US         20536
#> 14      VX          5162
#> 15      WN         12275
#> 16      YV           601

Are you receiving any error messages or experiencing unexpected behavior when attempting to create the list column using dbplyr?

This was the error in the initial posting.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.