Not able to load local text file into hive table / spark data frame

I have one text file in my local drive and wanted to save it as a spark data frame. Used sdf_copy_to() but i got the below error .

<-df = fread('/home/cdsw/HIST 1.txt')

|--------------------------------------------------| |==================================================| |--------------------------------------------------| |==================================================|

sdf_copy_to(con,df,name="sdf")

|=================================================================| 100% 1399 MB

Engine exhausted available memory, consider a larger engine size.

Engine exited with status 137.

I was also trying using
spark_read_text(con,name="Month1_IntlData",path="/home/cdsw/SUMMARY_DETAIL_HIST 1.txt",overwrite = TRUE)
And I got this error.

Error: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://<<server_name>>/home/cdsw/SUMMARY_DETAIL_HIST

1.txt;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:360)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:348)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:348)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:623)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Can you tell me how can we copy the local text file to spark data frame or to a hive table .

Thanks and Regards
Sankar Narayana

It looks like you're out of memory. Are you per chance working in Docker?

Error 137 in Docker denotes that the container was ‘KILL’ed by ‘oom-killer’ (Out of Memory). This happens when there isn’t enough memory in the container for running the process.

‘OOM killer’ is a proactive process that jumps in to save the system when its memory level goes too low, by killing the resource-abusive processes to free up memory for the system.

Even if not, it does expressly say that you'll need a larger engine.

Text file size is 1.9 GB.

You mean we need to increase driver memory attributes?

It sounds like it, though, to be honest, I'm not 100% sure. Hopefully someone with a bit more expertise will chime in.

Mara is correct, copy_to() needs additional memory in the driver machine which you can increment to, say, 8GB as follows:

config <- spark_config()
config["sparklyr.shell.driver-memory"] <- "8g"

# then add the config parameter to spark_connect()

The spark_read_*() functions only support loading data from local paths when connected in master = "local" mode, if you are running a proper Spark cluster, you would need to use an HDFS path instead of a local path, for instance:

spark_read_text(con,
                name = "Month1_IntlData",
                path = "hdfs://SUMMARY_DETAIL_HIST 1.txt",
                overwrite = TRUE)

If you are using HDFS, you can use appropriate tools, like running hadoop fs -ls from the terminal, to find out the correct path to a file in HDFS. See Hadoop FileSystemShell.

1 Like

my text file is stored in my project workspace '/home/cdsw', but not is HDFS path.

Ran with driver-memory 8g & 10g , but R engine is still exiting of exhausting memory.

Let me know if there is any other way to do it.

conf <- spark_config()

conf["sparklyr.shell.driver-memory"] <- "10g"

con <- spark_connect(master = "yarn",config = conf)

df <- fread(files[[1]])

|--------------------------------------------------| |==================================================| |--------------------------------------------------| |==================================================|

sdf_copy_to(con,df,name="sdf")

|=================================================================| 100% 1999 MB

Engine exhausted available memory, consider a larger engine size.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.