Issue fetch data from SQL using RODBC (memory issue)

mrmoleje · September 16, 2019, 9:00am

Hi,

I'm trying to fetch some data from SQL into R using the following code:

library(RODBC)

conn3 <- RODBC::odbcDriverConnect( 'driver={SQL Server};server=SERVERNAME;database=DB_NAME;trusted_connection=TRUE' )

my_data <- RODBC::sqlFetch(channel = conn3 ,sqtable = "Table name" ,stringsAsFactors = FALSE ,na.strings = "")

The data set is 1,032,576 KB and so unsurprisingly when I try to fetch from SQL to R I get an error message saying the memory has been exceeded.

Is there an R package that pulls data in from SQL and stores it in a way that means that the memory isn't exceeded?

DaveH · September 16, 2019, 9:22am

The general best practice is to not pull unaggregated data from databases into R immediately, but rather to use R to direct the SQL engine to process the data in the DB, and pull the relevant results. The recommended packages for that are

library(odbc)
library(dplyr)
library(dbplyr)

You can read more at db.rstudio.com, and specifically https://db.rstudio.com/dplyr/

mrmoleje · September 16, 2019, 1:52pm

Hi,

Thanks for responding. I thought this might be best practice but I want to keep my data in tidy format as much as possible (e.g. one token per row) for various reasons.

Is the way you've suggested the only way? If so I'll have a look at your recommended libraries.

andresrcs · September 16, 2019, 2:58pm

By using dbplyr your data will remain in a tidy format in the sql server but you will be able to manipulate it using dplyr commands from your R session, only fetching results when needed. You can even make plots and models processing the data in the sql server and only fetching results.

system · October 7, 2019, 3:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.