I need to use a finance data set to demonstrate the functionalities of Sparklyr to my students. This data needs to be larger than my RAM, which is 16 GB. Also, it needs to be open-source, so all my students could download the data by themselves.

In multiple steps; if you want to simulate 16 Gb of data and have 8 Gb of ram you simulate 6 Gb of data , write it to disk, and then do it two more times using write.table with append = TRUE the next two times to end up with a 18 Gb object, above your own ram capacity. Sorry I cannot use any example, as I don't know which exactly type of data are you planning to use (time series, spatial data etc), You can just use classical programming to make it meaningful, changing some of the covariates each simulation step to match some kind or real-example kind of dataset.