Hi All,
I am working with a colleague who has access to data on a Spark cluster, but access to that cluster is restricted to using Zeppelin notebooks (https://zeppelin.apache.org/)
In the past, R was one of the Zeppelin back-ends that was made available, but it was removed because of performance issues. This may have been before the advent of sparklyr.
I was wondering if anyone has had success using R + sparklyr in a Zeppelin environment, and if so, could you point me towards any how-to's that you may have come across.
Thanks!
1 Like
Hi Ian! I played with Zeppeling briefly some time back, but I didn't test sparklyr
with it though. I do think that a simple install.packages("sparklyr")
should work to get started. Are you looking for guidance beyond that?
Hi Edgar!
I am happy that you have been down this path (at least a little bit). I have spoken to the end-users (people who are using Zeppelin), but not to the administrators of the Spark cluster.
From what I had heard, part of the objection was that R was installed and running on all of the nodes of the Spark cluster.
Being a third-party to all of this, I have to ask forgiveness from all concerned (including you) for asking very basic questions.
Would sparklyr work if R/sparklyr is made available only as a part of the Zeppelin container, rather than installing R on all the nodes? I suspect that we would be restricted to doing things that sparklyr can translate to native Spark.
If this might possibly work, I think my next step would be to work with my end-user colleagues.
Thanks again!
Yes, you're correct. Unless you're using spark_apply()
, there is no need to have R installed in all of the nodes.
2 Likes
I am using sparklyr very deep, indeed, I like RStudio IDE more than Zeppelin, because I addict to code-autocomplete so far and better terminal integration.
sparklyr can only run like a mysql client. once you configure the spark conf (hdfs-site.xml,hive-site.xml,yarn-site.xml asking IT staff), you can use yarn-client mode to explore spark very easy.
using sparklyr in Zeppelin just like using DBI in Zeppelin, if you are seeking a more light way, I recommend you pursuit IT staff to lauch a livy service for you. Once you are using sparklyr just forget tedious spark-submit command and play dplyr with fun.
However, most of IT staff only know SparkR instead of sparklyr, and fail to get the convenience and importance of livy and sparklyr.
2 Likes
Thanks @harryzhu!
To persuade IT staff to change what they make available is - as you know - a task that requires a large and unknown amount of effort.
It is useful to have a direction in mind, so I am grateful to you for suggesting a direction.