unable to connect to Sparklyr through R on CDH

I was trying to connect to Spark using Sparklyr package in R. I tried changing configurations in spark_connect() but it is not working at all. I have tried everything available on internet.

library(sparklyr)
conf<-spark_config()
sc<-spark_connect(master="yarn-client",config=conf,spark_home="/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2")
#Get spark context 
ctx <- sparklyr::spark_context(sc)
#Use below to set the java spark context 
jsc <- invoke_static( sc, "org.apache.spark.api.java.JavaSparkContext", "fromSparkContext", ctx )

Hi @jaiminee295, can you confirm that the /opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/ bin/spark-submit file is indeed in that location?

Yes. spark-submit file exists under bin directory.

attr(,"class")

[1] "spark_connection" "spark_shell_connection" "DBIConnection"

19/08/13 23:17:49 INFO sparklyr: Gateway (62737) is terminating backend

(19/08/13 23:17:49 INFO sparklyr: Gateway (62737) is shutting down with expected SocketException,java.net.SocketException: Socket closed)

19/08/13 23:17:49 INFO spark.SparkContext: Invoking stop() from shutdown hook

[1000263410@abo-lp3-usred01 nebular]$ 19/08/13 23:17:49 INFO server.AbstractConnector: Stopped Spark@51f8483f{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}

19/08/13 23:17:49 INFO ui.SparkUI: Stopped Spark web UI at http://10.240.10.216:4041

19/08/13 23:17:49 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread

19/08/13 23:17:49 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors

19/08/13 23:17:49 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down

19/08/13 23:17:49 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices

(serviceOption=None,

services=List(),

started=false)

19/08/13 23:17:49 INFO cluster.YarnClientSchedulerBackend: Stopped

19/08/13 23:17:49 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

19/08/13 23:17:49 INFO memory.MemoryStore: MemoryStore cleared

19/08/13 23:17:49 INFO storage.BlockManager: BlockManager stopped

19/08/13 23:17:49 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

19/08/13 23:17:49 INFO rpc.b: Summary of connection to localhost:50505: RpcStats [name=localhost:50505, lifetimeStats=PeriodicStats [startTimeMillis=1565738262621, numMessages=3, numErrors=0, sumServiceTimeNanos=4960009, minNanos=906702, maxNanos=2150249, avg=1, adj=1], periodicStats=PeriodicStats [startTimeMillis=1565738262621, numMessages=3, numErrors=0, sumServiceTimeNanos=4960009, minNanos=906702, maxNanos=2150249, avg=1, adj=1], jsonSummaryStats=PeriodicStats [startTimeMillis=1565738262621, numMessages=3, numErrors=0, sumServiceTimeNanos=4960009, minNanos=906702, maxNanos=2150249, avg=1, adj=1], runningAverageMillis=0]

19/08/13 23:17:49 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

19/08/13 23:17:49 INFO spark.SparkContext: Successfully stopped SparkContext

19/08/13 23:17:49 INFO util.ShutdownHookManager: Shutdown hook called

19/08/13 23:17:49 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-15226992-753c-4f1d-afed-83d47e55b5e9

-----above shows up while running the script

Maybe increasing the time out? config[["sparklyr.gateway.start.timeout"]] <- 120

Hey @edgararuiz, it worked. Thank You so much. I appreciate your help.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.