Building Spark ML pipelines with sparklyr

Bill · March 8, 2018, 5:21am

This is a companion discussion topic for the original entry at:

Building Spark ML pipelines with sparklyr

https://www.rstudio.com/resources/videos/building-spark-ml-pipelines-with-sparklyr/

We provide an overview of the recently implemented Pipelines API in sparklyr, an R package for interfacing with Apache Spark. This new feature allows users to build and tune data transformation and machine learning pipelines that are interoperable with Scala and Python, simplifying handoffs between data science and data engineering. We go over the components of pipelines and walk through practical examples.

Building Spark ML pipelines with sparklyr, Kevin Kuo, @kevinykuo

Kevin Kuo - Software Engineer
Kevin is a software engineer focused on building R interfaces to big data and machine learning tools like Spark and TensorFlow. He has experience applying data analytics in a variety of settings from insurance claims analytics to predictive maintenance of industrial assets. Outside of data science, Kevin enjoys wine tasting and crafting fancy cocktails.