Need help getting started with Spark and sparklyr

sparklyr

#1

Hi,

I am having trouble getting started with sparklyr and a local install of Spark on Windows 10. Any help appreciated, I'm just getting started with Spark.

tl;dr; It looks like I am missing %SPARK_HOME/launcher/target/scala-2.xx. Where does that come from?

All the details
I installed sparklyr 0.94 from CRAN, then installed Spark 2.4.0 using
sparklyr::spark_install(version = "2.4.0")

When I try to start Spark with
sc <- spark_connect(master = "local", version='2.4.0')
I get this error:

Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (37723): Gateway in localhost:8880 did not respond.
    Path: C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\spark-submit2.cmd
    Parameters: --class, sparklyr.Shell, "C:\Program Files\R\Library\sparklyr\java\sparklyr-2.3-2.11.jar", 8880, 37723
    Log: C:\Users\KJOHNS~1.CAL\AppData\Local\Temp\RtmpuKTspW\file3fa013955648_spark.log


---- Output Log ----


---- Error Log ----
Calls: spark_connect ... tryCatchOne -> <Anonymous> -> abort_shell -> <Anonymous> -> force
In addition: Warning message:
In system2(spark_submit_path, args = shell_args, stdout = stdout_param,  :
  'CreateProcess' failed to run 'C:\Users\kjohnson\AppData\Local\spark\SPARK-~1.7\bin\SPARK-~1.CMD --class sparklyr.Shell "C:\Program Files\R\Library\sparklyr\java\sparklyr-2.3-2.11.jar" 8880 37723'

If I try the command from the command line, I get a somewhat more informative error:

C:\> C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\spark-submit2.cmd --class sparklyr.Shell "C:\Program Files\R\Library\sparklyr\java\sparklyr
-2.3-2.11.jar" 8880 76708
Exception in thread "main" java.lang.IllegalStateException: Cannot find any build directories.
        at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
        at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(AbstractCommandBuilder.java:242)
        at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:196)
        at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:117)
        at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:261)
        at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:164)
        at org.apache.spark.launcher.Main.buildCommand(Main.java:110)
        at org.apache.spark.launcher.Main.main(Main.java:63)
Deleting C:\Users\KJOHNS~1.CAL\AppData\Local\Temp\spark-class-launcher-output-26749.txt
     1 file deleted

Looking at this source code, it appears that the command is looking for a file at either %SPARK_HOME/launcher/target/scala-2.12 or %SPARK_HOME/launcher/target/scala-2.11.

My SPARK_HOME is C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\.. and there is no launcher directory there.

So, is there something else I need to install? Am I doing something wrong?

I retried specifying Spark version 2.3.2 for spark_install and spark_connect with the same result...

Thanks for any help!!
Kent


#2

From the sources, looks something might have set SPARK_PREPEND_CLASSES.

Can you make sure that SPARK_HOME not SPARK_PREPEND_CLASSES are not being set? Or explicitly run Sys.setenv(SPARK_HOME = "") and Sys.setenv(SPARK_PREPEND_CLASSES = "").


#3

Thanks, that doesn't seem to be the problem, SPARK_PREPEND_CLASSES is not defined.


#4

I was not able to get sparklyr to work with 2.4. Here is the link to an issue on sparklyr site

https://github.com/rstudio/sparklyr/issues/1749

Not sure what you did to clean up between 2.4 and 2.32 spark versions.

Phil


#5

Thanks @phil_hummel. I saw that ticket. The release notes for sparklyr 0.9.3 say it supports Spark 2.4.0 and spark_available_versions listed it so I thought it was worth a try. I did not do anything to clean up between versions, it looks like they are separate installs. Is there something you recommend? What version of Spark worked for you?


#6

I haven't installed on Windows but I have done a bunch of 2.2.x and 2.3.2 installs on Linux. The most common issue I have run into is Java versions newer than 1.8 but those error messages are really clear.