The general best practice is to not pull unaggregated data from databases into R immediately, but rather to use R to direct the SQL engine to process the data in the DB, and pull the relevant results. The recommended packages for that are
Thanks for responding. I thought this might be best practice but I want to keep my data in tidy format as much as possible (e.g. one token per row) for various reasons.
Is the way you've suggested the only way? If so I'll have a look at your recommended libraries.
By using dbplyr your data will remain in a tidy format in the sql server but you will be able to manipulate it using dplyr commands from your R session, only fetching results when needed. You can even make plots and models processing the data in the sql server and only fetching results.