Engine exhausted available memory, consider a larger engine size.
Engine exited with status 137.
I was also trying using
spark_read_text(con,name="Month1_IntlData",path="/home/cdsw/SUMMARY_DETAIL_HIST 1.txt",overwrite = TRUE)
And I got this error.
Error: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://<<server_name>>/home/cdsw/SUMMARY_DETAIL_HIST
1.txt;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:348)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:348)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:623)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Can you tell me how can we copy the local text file to spark data frame or to a hive table .
It looks like you're out of memory. Are you per chance working in Docker?
Error 137 in Docker denotes that the container was ‘KILL’ed by ‘oom-killer’ (Out of Memory). This happens when there isn’t enough memory in the container for running the process.
‘OOM killer’ is a proactive process that jumps in to save the system when its memory level goes too low, by killing the resource-abusive processes to free up memory for the system.
Even if not, it does expressly say that you'll need a larger engine.
Mara is correct, copy_to() needs additional memory in the driver machine which you can increment to, say, 8GB as follows:
config <- spark_config()
config["sparklyr.shell.driver-memory"] <- "8g"
# then add the config parameter to spark_connect()
The spark_read_*() functions only support loading data from local paths when connected in master = "local" mode, if you are running a proper Spark cluster, you would need to use an HDFS path instead of a local path, for instance:
If you are using HDFS, you can use appropriate tools, like running hadoop fs -ls from the terminal, to find out the correct path to a file in HDFS. See Hadoop FileSystemShell.