I am trying to run prophet model to forecast demand on each of the store item pairs. This is the function which I have to run on 1.5 million store item pairs. This is the function which I am trying to apply to each pair:
prophet_model<-function(df)
{ #Sort by Date
df<-df[order(df$ds),]
#Divide data into test and train
test<-tail(df, ceiling(nrow(df)*0.06))
train<-df[!df$ds %in% test$ds,]
#Train model
model_prophet <- prophet(train[,c('ds','y')], holidays = holidays,daily.seasonality = FALSE)
#Test Model
test_forecast = predict(model_prophet, test)
test_forecast$ds<-as.Date(test_forecast$ds)
#Predict for next week
dates<-as.data.frame(seq(as.Date(Sys.Date())+1, by = "day", length.out = 7))
colnames(dates)<-'ds'
forecast = predict(model_prophet, dates)
forecast<- forecast[, c("ds","yhat","yhat_upper","yhat_lower")]
forecast<-forecast %>% mutate(item = unique(factor(df$item)), store=unique(factor(df$store)))
#Test accuracy
testdata<-merge(test,test_forecast[,c('yhat','ds')],by="ds",all.x=TRUE)
forecast$PredictionAccuracy<-accuracy_func(testdata)
return(forecast)
}
I need to split 180 million rows into lists of unique pair of columns. Then, I want to apply a function on each of these lists using parLapply(). But the R session crashes or just keeps on running when I try to split the dataframe into lists. I have tried the split() and group_split() so far:
data<-df %>% group_split(col1,col2)
data <- split (df, list( df$col1, df$col2)))
I am trying to do parLapply but couldn't run without splitting the dataframe into lists. Also, since I am working on Windows it is difficult to load this data on each cluster.
result <- parLapply(cl, data, prophet_model))
I also tried to apply function directly using do() but it shown 1000 hours for completion:
data<-df %>% group_by(col1,col2) %>% do(function(.))
This function works on a small dataset. I have tried parallel processing and do() function for few pairs and it worked fine.
Please let me know if there is any other way of splitting or applying function to this large dataset.