Using "Spark Connect" within R / tidyverse

Hi everyone,

I am just exploring the spark functionalities within the tidyverse.

One question that I habe is that there is a new technology called "Spark Connect", which let's you run your connection to spark in a client/server mode.

Looking at the documentation I see that phyton is supported, but not R.

Has anyone used this approach with R?

Best!

Hi, yes, a new sparklyr extension called pysparklyr does exactly that. It wraps the Python components in order to connect and interact. You will need a working version of Python working in your machine. But, if you are connecting to an external cluster, you won't need Java (JVM)!!

If you are testing Spark Connect inside your laptop, then pysparklyr has a function to start a new Spark session locally. You will need JVM for that though. Here's sample code for that:

install.packages(pysparklyr)
pysparklyr::install_pyspark("3.5") #Install Spark 3.5 locally so you can run Spark Connect
sparklyr::spark_install("3.5") # Creates a Python environment with the necessary components

library(sparklyr)
pysparklyr::spark_connect_service_start("3.5")
sc <- spark_connect("sc://localhost/" "spark_connect")

# Interact with Spark Connect

spark_disconnect(sc)
pysparklyr::spark_connect_service_stop()


Hello Edgard,

Thanks for this!

Rasmus

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Hi again, I just published an article about Spark Connect: sparklyr - Spark Connect