I need to calculate the group_by lag value of an existing column using sparklyr. My code is as below:
require(sparklyr)
require(tidyverse)
USD2010_tbl <- spark_read_csv(sc, "USD2010", "USD2010.csv", header = FALSE) # read data into Spark
src_tbls(sc) #list the data in Spark
USD2010_tbl %>% rename(pair = V1, timestamp = V2, bid = V3, ask = V4) %>% # add column names
mutate(price = (ask + bid)/2) %>% # calculate price as mid point of bid and ask
group_by(pair) %>%
arrange(pair) %>%
mutate(price_lag = lag(price))
I got the error message:
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 15.0 failed 1 times, most recent failure: Lost task 3.0 in stage 15.0 (TID 444, localhost, executor driver): java.io.IOException: No space left on device