Apache Arrow R - more than sparklyr?

Trying to get the gist of how the Apache Arrow project can be used in R...

I see it has been already implemented in sparklyr.. but we dont use spark.

From all the stuff i've read it has many benefits including these:

  • efficient memory representation. R is quite bloated for memory usage - is Arrow something that can be used as the backend for any objects?
  • gandiva - language-agnostic data querying - i.e. once the memory object is in arrow format, non-R libraries can be used to compile/execute the query
  • querying across larger-than-memory datasets

does anyone know what work there is to integrate arrow beyond sparklyr?


I'd be interested in this too. I currently use the arrow package for its read & write feather (deserialise in memory) & parquet functions, but not much else. Seems like a lot install & keep up to date for not much use, unless some packages are using it already without me knowing?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.