See the FAQ: How to do a minimal reproducible example reprex
for beginners.
The idea behind an R
database connection is to allow bringing variables and records into memory selectively, perform some operation and then doing something with the result. That something may include writing it back to the database but doing so can be unwise.
The database libraries
- Open a connection
- Allow the database to be queried
- Close the connection
The {pool}
library deals with 1 & 3. It is primarily intended for use within shiny
apps and can be replaced with more direct approaches. For example, see the example in help(dbPool)
.
Other libraries deal with all three aspects. A script would look like
library(DBI)
library(RMariaDB)
con <- dbConnect(RMariaDB::MariaDB(),
username="the_user",
password="the_password",
dbname ="r")
dbListTables(con)
#> [1] "jgr"
dbListFields(con, "jgr")
#> [1] "recid" "id" "v1" "v2" "v3" "v4" "v5" "v6" "v7"
#> [10] "v8" "v9" "v10" "v11" "v12" "v13" "v14" "v15" "v16"
#> [19] "v17" "v18" "v19" "v20" "v21" "v22" "v23" "v24" "v25"
#> [28] "v26" "v27" "v28" "v29" "v30"
x <- dbGetQuery(con, "SELECT v4 FROM jgr WHERE id = 'NS_000'")
y <- dbGetQuery(con, "SELECT v6 FROM jgr WHERE id = 'NS_022'")
dbDisconnect(con)
dat <- cbind(x,y)
fit <- lm(v6 ~ v4, data = dat)
summary(fit)
#>
#> Call:
#> lm(formula = v6 ~ v4, data = dat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -618.21 -228.26 -6.68 13.13 1950.54
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -3.18906 13.50270 -0.236 0.813
#> v4 0.59126 0.01733 34.125 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 391.6 on 999 degrees of freedom
#> Multiple R-squared: 0.5383, Adjusted R-squared: 0.5378
#> F-statistic: 1165 on 1 and 999 DF, p-value: < 2.2e-16
This does use two SQL statements, like SELECT v4 FROM jgr WHERE id = 'NS_000'
. An alternative is provided by {dbplyr}
, which relieves even that slight degree of complexity by allowing queries using {dplyr}
syntax.