I am interested in creating and populating a SQLite database via R. One challenge I've encountered concerns inserting duplicate rows.
As in the following reprex, it's easy to (advertently or not) add duplicate rows:
library(DBI)
library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbCreateTable(con, "iris", iris)
dbAppendTable(con, "iris", iris)
#> [1] 150
dbAppendTable(con, "iris", iris)
#> [1] 150
nrow(dbReadTable(con, "iris"))
#> [1] 300
dbDisconnect(con)
If you manually create a table using DBI::dbSendQuery()
, you can specify a primary key; my understanding is that this makes it so that duplicate rows cannot be added (by adding as a constraint that there are no duplicated primary keys):
library(DBI)
library(RSQLite)
library(tidyverse)
iris <- mutate(iris, row = row_number())
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbSendQuery(con,
"CREATE TABLE iris (
row INTEGER PRIMARY KEY,
`Sepal.Length` REAL,
`Sepal.Width` REAL,
`Petal.Length` REAL,
`Petal.Width` REAL,
Species TEXT);")
#> <SQLiteResult>
#> SQL CREATE TABLE iris (
#> row INTEGER PRIMARY KEY,
#> `Sepal.Length` REAL,
#> `Sepal.Width` REAL,
#> `Petal.Length` REAL,
#> `Petal.Width` REAL,
#> Species TEXT);
#> ROWS Fetched: 0 [complete]
#> Changed: 0
dbAppendTable(con, "iris", iris)
#> [1] 150
dbAppendTable(con, "iris", iris)
#> Error: UNIQUE constraint failed: iris.row
nrow(dbReadTable(con, "iris"))
#> [1] 150
dbDisconnect(con)
Here - correctly - none of the rows were inserted, because all were duplicates.
However, my question is: Is there a way to only add to a table rows that are unique?
It seems like there is a way to do this in SQLite using an INSERT IGNORE
command/part of a query:
if you use the INSERT IGNORE statement, the rows with invalid data that cause the error are ignored and the rows with valid data are inserted into the table
But, neither DBI::dbWriteTable()
and DBI::dbAppendTable()
(or any other function from the DBI package) appear to support it. In short, I'm looking for something like the following to add 100 new rows - and to ignore the first 50 with that are duplicates (because of the duplicate row ID):
library(DBI)
library(RSQLite)
library(tidyverse)
iris <- mutate(iris, row = row_number())
iris <- mutate(iris, row = 101:250)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbSendQuery(con,
"CREATE TABLE iris (
row INTEGER PRIMARY KEY,
`Sepal.Length` REAL,
`Sepal.Width` REAL,
`Petal.Length` REAL,
`Petal.Width` REAL,
Species TEXT);")
#> <SQLiteResult>
#> SQL CREATE TABLE iris (
#> row INTEGER PRIMARY KEY,
#> `Sepal.Length` REAL,
#> `Sepal.Width` REAL,
#> `Petal.Length` REAL,
#> `Petal.Width` REAL,
#> Species TEXT);
#> ROWS Fetched: 0 [complete]
#> Changed: 0
dbAppendTable(con, "iris", iris)
#> [1] 150
dbAppendTable(con, "iris", iris)
#> Error: UNIQUE constraint failed: iris.row
nrow(dbReadTable(con, "iris"))
#> [1] 150
dbDisconnect(con)