How to create a new table in a database?

In the following code, I show how to create a new table by sending a sql query to the database. What if I mutate the data in a pipeline? How do I write it into a new table if it is bigger than RAM such that collecting the data is not possible?

library("tidyverse")
library("duckdb")

#'
#' Create a new table with data in database
#'

con <- dbConnect(duckdb(), dbdir = "test.duckdb")
con |> dbWriteTable("mtcars", mtcars)

#'
#' Create a copy of the data in the databse
#' 

con |> dbExecute("create table mtcars2 as select * from mtcars")
mtcars2 <- con |> tbl("mtcars2")

#'
#' Modify a column and write a new table
#' 

mtcars |>
  mutate(mpg2 = mpg * 2)
# this step is missing
1 Like

I'm not quite sure if it's about writing a new table or overwriting / altering existing one, but when writing result to a new table with compute() you can set a name and create a persistent table with temporary = FALSE:

library(dplyr)
library(duckdb)
#> Loading required package: DBI

con <- dbConnect(duckdb(), dbdir = "test.duckdb")
dbWriteTable(con, "mtcars", mtcars)

tbl(con, "mtcars") |> 
  mutate(mpg2 = mpg * 2) |> 
  compute(name = "mtcars2",  temporary = FALSE) 
#> # Source:   table<mtcars2> [?? x 12]
#> # Database: DuckDB v1.1.3 [margus@Windows 10 x64:R 4.4.2/D:\rtmp\RtmpMVOMs8\reprex-a25837a470d2-irate-booby\test.duckdb]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  mpg2
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4  42  
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4  42  
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1  45.6
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1  42.8
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2  37.4
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1  36.2
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4  28.6
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2  48.8
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2  45.6
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4  38.4
#> # ℹ more rows

dbListTables(con)
#> [1] "mtcars"  "mtcars2"
tbl_function(con, "duckdb_tables()")
#> # Source:   SQL [?? x 16]
#> # Database: DuckDB v1.1.3 [margus@Windows 10 x64:R 4.4.2/D:\rtmp\RtmpMVOMs8\reprex-a25837a470d2-irate-booby\test.duckdb]
#>   database_name database_oid schema_name schema_oid table_name table_oid comment
#>   <chr>                <dbl> <chr>            <dbl> <chr>          <dbl> <chr>  
#> 1 test                  1146 main              1148 mtcars          1320 <NA>   
#> 2 test                  1146 main              1148 mtcars2         1329 <NA>   
#> # ℹ 9 more variables: tags <list>, internal <lgl>, temporary <lgl>,
#> #   has_primary_key <lgl>, estimated_size <dbl>, column_count <dbl>,
#> #   index_count <dbl>, check_constraint_count <dbl>, sql <chr>
1 Like

That works very well! Thank you very much!