Support for spark-submit
is implemented through spark_submit()
in sparklyr
, see https://github.com/rstudio/sparklyr/pull/1690. The batch.R
file should define connection as disconnection as follows:
library(sparklyr)
sc <- spark_connect(master = "local")
# custom sparklyr code goes here...
sdf_len(sc, 10) %>% spark_write_csv("batch.csv")
spark_disconnect(sc)
Then, in order to submit a batch job, use spark_submit()
as follows, use the appropriate parameters for your connection (e.g. master = "yarn"
, etc.)
library(sparklyr)
config <- spark_config()
config[["sparklyr.shell.num-executors"]] <- 10
# additional configuration settings
spark_submit(master = "yarn", file = "batch.R", config = config)