Goal
I am currently running a semi-complicated workflow that is quickly taking up all my RAM/Memory. I have numerous time series type models and each model outputs unique dataframes for each timestep. Thus, each model will have atleast 20-70 data frames by the time they are completed (not my design, so I am stuck with it). To perform a model to model comparison I am trying to take each of those data frames, from there respective model, and select what I need to join them into a singular dataframe per model. My workflow works great until I am comparing 20+ models.
This is why I am trying to find a way to step around my RAM issue. Below I have provided a single models worth of data and the chunk of code that converts the multiple dataframes per time step into a singular dataframe. If I change the listed listed dataframes to disk.frames I think I can drastically reduce the strain on my computer, but I am struggling to figure out how to do this. If anyone has experience using disk.frames and is willing to help out I would be much obliged.
Data
I haven't seen any better way to share rds files, so until then here are the links to download my rds files. I know it's not clean so sorry, but unless you want some massive chunks of code to reproduce the data without having to download it, this is the best I got.
Current Workflow
Each rds here is a list of lists. pfl_data
is a list of each model and at each model is a list of each dataframe from each timestep within the model. col_ids
is a list of listed values corresponding to the timestep dataframes used to determine what columns I want to pull from each data frame. so essentially I boil the dataframes down to two columns each and then left join them by the first column with a final renaming at the end.
# insert the quoted path (with the filename) to where you downloaded the rds file into the here() function
col_ids <- readr::read_rds(file = here::here())
# insert the quoted path (with the filename) to where you downloaded the rds file into the here() function
pfl_data <- readr::read_rds(file = here::here())
new_pfl_data <- furrr::future_map2(pfl_data, col_ids, function(pfl, col){
purrr::map2(pfl[-1], col[-1], ~ .x[c(2, .y)])
}) %>%
furrr::future_map(function(x){
purrr::reduce(x, dplyr::left_join, by = "V2") %>%
stats::setNames(c("Y", paste0("Z", 2:ncol(.))))
})
I was thinking about going upstream and changing pfl_data
from a list of listed dataframes into a list of listed disk.frames with as.disk.frame()
but I don't think that would work because I believe I could isolate unique columns from each disk.frame if they are listed like this. Thanks for taking the time to look this over!