Sparklyr with Zeppelin?

spark

#1

Hi All,

I am working with a colleague who has access to data on a Spark cluster, but access to that cluster is restricted to using Zeppelin notebooks (https://zeppelin.apache.org/)

In the past, R was one of the Zeppelin back-ends that was made available, but it was removed because of performance issues. This may have been before the advent of sparklyr.

I was wondering if anyone has had success using R + sparklyr in a Zeppelin environment, and if so, could you point me towards any how-to’s that you may have come across.

Thanks!


#2

Hi Ian! I played with Zeppeling briefly some time back, but I didn’t test sparklyr with it though. I do think that a simple install.packages("sparklyr") should work to get started. Are you looking for guidance beyond that?


#3

Hi Edgar!

I am happy that you have been down this path (at least a little bit). I have spoken to the end-users (people who are using Zeppelin), but not to the administrators of the Spark cluster.

From what I had heard, part of the objection was that R was installed and running on all of the nodes of the Spark cluster.

Being a third-party to all of this, I have to ask forgiveness from all concerned (including you) for asking very basic questions.

Would sparklyr work if R/sparklyr is made available only as a part of the Zeppelin container, rather than installing R on all the nodes? I suspect that we would be restricted to doing things that sparklyr can translate to native Spark.

If this might possibly work, I think my next step would be to work with my end-user colleagues.

Thanks again!


#4

Yes, you’re correct. Unless you’re using spark_apply(), there is no need to have R installed in all of the nodes.


#5

I am using sparklyr very deep, indeed, I like RStudio IDE more than Zeppelin, because I addict to code-autocomplete so far and better terminal integration.

sparklyr can only run like a mysql client. once you configure the spark conf (hdfs-site.xml,hive-site.xml,yarn-site.xml asking IT staff), you can use yarn-client mode to explore spark very easy.

using sparklyr in Zeppelin just like using DBI in Zeppelin, if you are seeking a more light way, I recommend you pursuit IT staff to lauch a livy service for you. Once you are using sparklyr just forget tedious spark-submit command and play dplyr with fun.

However, most of IT staff only know SparkR instead of sparklyr, and fail to get the convenience and importance of livy and sparklyr.


#6

Thanks @harryzhu!

To persuade IT staff to change what they make available is - as you know - a task that requires a large and unknown amount of effort.

It is useful to have a direction in mind, so I am grateful to you for suggesting a direction.