How to run .R files using spark-submit from CDSW terminal?

Hello Team,

Unable to run .R files from CDSW terminal .Getting error after running below code. Please let me know how to overcome from this.

!spark-submit -v --master yarn --deploy-mode client --num-executors 10 --executor-memory 4g --executor-cores 4 --driver- cores 4 --driver-memory 4g Sample.R

Thanks and Regards

Sankar Narayana

Support for spark-submit is implemented through spark_submit() in sparklyr, see https://github.com/rstudio/sparklyr/pull/1690. The batch.R file should define connection as disconnection as follows:

library(sparklyr)
sc <- spark_connect(master = "local")

# custom sparklyr code goes here...
sdf_len(sc, 10) %>% spark_write_csv("batch.csv")

spark_disconnect(sc)

Then, in order to submit a batch job, use spark_submit() as follows, use the appropriate parameters for your connection (e.g. master = "yarn", etc.)

library(sparklyr)

config <- spark_config()
config[["sparklyr.shell.num-executors"]] <- 10
# additional configuration settings

spark_submit(master = "yarn", file = "batch.R", config = config)
4 Likes

Thank you . It worked for me.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.