Need help getting started with Spark and sparklyr



I am having trouble getting started with sparklyr and a local install of Spark on Windows 10. Any help appreciated, I'm just getting started with Spark.

tl;dr; It looks like I am missing %SPARK_HOME/launcher/target/scala-2.xx. Where does that come from?

All the details
I installed sparklyr 0.94 from CRAN, then installed Spark 2.4.0 using
sparklyr::spark_install(version = "2.4.0")

When I try to start Spark with
sc <- spark_connect(master = "local", version='2.4.0')
I get this error:

Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (37723): Gateway in localhost:8880 did not respond.
    Path: C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\spark-submit2.cmd
    Parameters: --class, sparklyr.Shell, "C:\Program Files\R\Library\sparklyr\java\sparklyr-2.3-2.11.jar", 8880, 37723
    Log: C:\Users\KJOHNS~1.CAL\AppData\Local\Temp\RtmpuKTspW\file3fa013955648_spark.log

---- Output Log ----

---- Error Log ----
Calls: spark_connect ... tryCatchOne -> <Anonymous> -> abort_shell -> <Anonymous> -> force
In addition: Warning message:
In system2(spark_submit_path, args = shell_args, stdout = stdout_param,  :
  'CreateProcess' failed to run 'C:\Users\kjohnson\AppData\Local\spark\SPARK-~1.7\bin\SPARK-~1.CMD --class sparklyr.Shell "C:\Program Files\R\Library\sparklyr\java\sparklyr-2.3-2.11.jar" 8880 37723'

If I try the command from the command line, I get a somewhat more informative error:

C:\> C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\spark-submit2.cmd --class sparklyr.Shell "C:\Program Files\R\Library\sparklyr\java\sparklyr
-2.3-2.11.jar" 8880 76708
Exception in thread "main" java.lang.IllegalStateException: Cannot find any build directories.
        at org.apache.spark.launcher.CommandBuilderUtils.checkState(
        at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(
        at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(
        at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(
        at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(
        at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(
        at org.apache.spark.launcher.Main.buildCommand(
        at org.apache.spark.launcher.Main.main(
Deleting C:\Users\KJOHNS~1.CAL\AppData\Local\Temp\spark-class-launcher-output-26749.txt
     1 file deleted

Looking at this source code, it appears that the command is looking for a file at either %SPARK_HOME/launcher/target/scala-2.12 or %SPARK_HOME/launcher/target/scala-2.11.

My SPARK_HOME is C:\Users\kjohnson\AppData\Local\spark\spark-2.4.0-bin-hadoop2.7\bin\.. and there is no launcher directory there.

So, is there something else I need to install? Am I doing something wrong?

I retried specifying Spark version 2.3.2 for spark_install and spark_connect with the same result...

Thanks for any help!!



From the sources, looks something might have set SPARK_PREPEND_CLASSES.

Can you make sure that SPARK_HOME not SPARK_PREPEND_CLASSES are not being set? Or explicitly run Sys.setenv(SPARK_HOME = "") and Sys.setenv(SPARK_PREPEND_CLASSES = "").

1 Like


Thanks, that doesn't seem to be the problem, SPARK_PREPEND_CLASSES is not defined.



I was not able to get sparklyr to work with 2.4. Here is the link to an issue on sparklyr site

Not sure what you did to clean up between 2.4 and 2.32 spark versions.




Thanks @phil_hummel. I saw that ticket. The release notes for sparklyr 0.9.3 say it supports Spark 2.4.0 and spark_available_versions listed it so I thought it was worth a try. I did not do anything to clean up between versions, it looks like they are separate installs. Is there something you recommend? What version of Spark worked for you?



I haven't installed on Windows but I have done a bunch of 2.2.x and 2.3.2 installs on Linux. The most common issue I have run into is Java versions newer than 1.8 but those error messages are really clear.


closed #7

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.