I've been using R with RStudio for mac already for a while, and I wonder if there're any better ways of storing data than just having csv files lying around. I have a rather large dataset (3 million rows with 15 variables per quarter), spanning for 20 years.
Now with data.table, it's not that slow to have everything in csv, but I still need to import each file every time I need to query data. Meaning that when doing a temporal analysis things can get messy memory wise.
I thought of SQL, as it is the most known/used database, but I wonder if there's something simpler to use and quicker to interact with when using R. Probably something which uses Spark?
You can use sparklyr to handle out-of-memory data. As for storage purpose, feather and fst formats are quite efficient in terms of being imported into R. If you really want to put your data in SQL database, take a look at Postgresql.
It depends on what you need to do. But when I need to deal with lots of smaller files and don't want to deal with a database, I really like the feather package and format.
Now I have 80 csv files, organised per quarter. The idea would be to have it in a format, which would allow me querying the database using Spark or even dplyr, without having to load each file into the memory.
It works fine for a couple of quarters, but when looking at a 10 year period, it starts to get messy.