mclapply (library(parallel) for read and write multiple files

I am referring to the following question as a way of reproducing my question.
Steps of my problem:

  1. parent.csv has a list of my station names (500)
  2. Each station is read from folder-A and analyzed.
  3. The final output is write to another folder-B. This whole procedure took me hours.

A sample code is

df_func <- function(j){
  
  # Directory for station list files
  setwd("C:/Users/...../QC")
  obs_files=read.csv('parent.csv', colClasses=c("Station"="character"))
  obs_files=paste0(obs_files$Station, ".csv")
  
  #read input file
  df=read_csv(paste0("C:/Users/...../QC/",
                     obs_files[j]),col_names = TRUE)
  
  #write output file
  df_hour1=read_csv(paste0("C:/Users/...../biased/",
                           obs_files[j]),col_names = TRUE)
 
  
  setwd("C:/Users/....../trial")
  write_csv(df_hour1, path=paste0(obs_files[j]))
}

Here, j= station numbers.
I can use
mapply(function(j) df_func(j),j=1:500). But, this will take me a lot time.

How can I proceed with this mcapply as I want to proceed with parallel computation?

consider this example

library(tidyverse)
library(doParallel)
registerDoParallel(3)


#make example data into present working directory
iwalk(map(1:33,~slice(iris,.:(.+10))),
      ~write.csv(.x,paste0("iris_",.y,".csv")))
     

# get a list of the files we made
(filesnames_to_read_and_write <- list.files(pattern = "\\iris*"))

#invent a destination
out_directory <- tempdir()

list.files(path=out_directory)


# run in parallel 
foreach(fn=filesnames_to_read_and_write) %dopar% {
 write.csv(x = read.csv(fn),
           file = file.path(out_directory,fn))
}

stopImplicitCluster()

#see what we made
list.files(path=out_directory)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.