How I use foreach instead of for loop

I'm trying to do a for loop over 6000000 or so and they are all computed independently so I wonder if I can replace it with foreach.
p.s : The TIME_SERIES just like data.frame(VALUE=C(1,2,3,4,5),DOY=c(13,25,36,54,66))

for (i in 1:ncell(cor_raster)) {
  TIME_SERIES<-GET_CELL_VALUE(YEAR = 2015,BASE_PATH = "G:/LANDSAT/PATH146ROW036/MEAN",number = i,MAT = MAT_2015) ##Get the time series of points and their corresponding times,return a data frame
  if(any(is.na(TIME_SERIES))){
    FDD_MAT[i]<-NA
    TDD_MAT[i]<-NA
  }else{
    some calculate funtion and return a vector,such as c(1,2) 
    FDD_MAT[i]<-RETURN_VECTOR[1]
    TDD_MAT[i]<-RETURN_VECTOR[2]
  }
}

However, I found very few examples where the custom function is called in the foreach, so I am confused how to edit the foreach code. At the same time I tried editing the code with foreach, but it didn't work, it is attached below,How should I modify it?

cl<-cl.cores-1
cl <- makeCluster(cl.cores)
registerDoParallel(cl <- makeCluster(cl)) 
result<-list()
ss<-foreach(i=1:ncell(cor_raster),.combine = "rbind",.inorder = TRUE)%do%{
  TIME_SERIES<-GET_CELL_VALUE(YEAR = 2015,BASE_PATH = "G:/LANDSAT/PATH146ROW036/MEAN",number = i,MAT = MAT_2015)
  if(any(is.na(TIME_SERIES))){
    FDD_MAT[i]<-NA
    TDD_MAT[i]<-NA
  }else{
    some calculate funtion and return a ve,such as c(1,2) 
    FDD_MAT[i]<-RETURN_VECTOR[1]
    TDD_MAT[i]<-RETURN_VECTOR[2]
  }
  result[[1]]<-TDD_MAT
  result[[2]]<-FDD_MAT
  return(result)
}

it makes sense to use i within you foreach to access the inputs for your function, but not to locate a place to fill in some output, you just want to gather the output of the function, and you let foreach rbind the results together...

see the following code for a worked up example; note that this is a reprex, as it can be run by yourself and other forum users.


prompts <- 1:10
prompts[3] <- NA

FDD_MAT <- list()
TDD_MAT <- list()

# some calculate funtion and return a vector,such as c(1,2) 
myfunc <- function(x){
  return(c(2*x,x*x))
}

for (i in prompts) {
  
  if(any(is.na(i))){
    FDD_MAT[i]<-NA
    TDD_MAT[i]<-NA
  }else{
    RETURN_VECTOR <- myfunc(i)
    FDD_MAT[i]<-RETURN_VECTOR[1]
    TDD_MAT[i]<-RETURN_VECTOR[2]
  }
}
#review what you got in the unparallel version
cbind(prompts,FDD_MAT,TDD_MAT)



library(foreach)
library(parallel)
library(doParallel)
cl.cores<-parallel::detectCores()-1
cl <- makeCluster(cl.cores)
registerDoParallel(cl)  


ss<-foreach(i=prompts,.combine = "rbind",.inorder = TRUE)%do%{
   if(any(is.na(i))){
    FDD_MAT_<-NA
    TDD_MAT_<-NA
  }else{
    RETURN_VECTOR <- myfunc(i)
    FDD_MAT_<-RETURN_VECTOR[1]
    TDD_MAT_<-RETURN_VECTOR[2]
  }

  return(c(FDD_MAT_,
           TDD_MAT_))
}

ss
1 Like

Thank you for your help. In addition, I saw %do% and %dopar% in the help documentation. What is the difference between the two and which one is more efficient?

my bad, I wasnt paying attention, I should have written dopar because

%do% evaluates the expression sequentially, while %dopar% evaluates it in parallel.

Thank you for your help. But after I re-modified the code, I found some problems. How should nested structures be parallelized? And how should global variables be passed in foreach? In the code below I simulate the data I am using and my updated code. But I'm still confused about how to convert to foreach for parallel computation. Can you help me?

#######import package####
library(ff)
library(raster)
####Here is the simulated data######### 
TDD_MAT<-matrix(1, nrow = 50, ncol = 50)
FDD_MAT<-matrix(1, nrow = 50, ncol = 50)
for(k in 1:24){
 matrix<-matrix(runif(250,min = -15+k,max = -14+k), nrow = 50, ncol = 50)
 raster<-raster(matrix)
 extent(raster)<-extent(1,50,1,50)
 raster_stack<-addLayer(raster_stack,raster,k)
}
MAT_2015 <- ff(vmode="double",dim=c(ncell(raster_stack),nlayers(raster_stack)),filename=paste0(getwd(),"/stack.ffdata"))
  for(i in 1:nlayers(raster_stack)){
    MAT_2015[,i] <- raster_stack[[i]][]
  }
doy<-c(2,4,6,35,53,65,72,105,125,132,145,165,178,184,195,243,255,265,288,293,302,333,355,362)
#########################The above is a data simulation#############
##########################The following is a for loop that wants to calculate in parallel###
for (i in 1:10000) {
  if(any(is.na(MAT_2015[i,]))){  
    FDD_MAT[i]<-NA ##NA cases do not appear in simulated data
    TDD_MAT[i]<-NA
  }else{
    day<-doy ##doy is a global variable I defined,just like c(2,65,160,243,325)
    x<-c(day*6.28/365)
    y<-MAT_2015[i,]
    fm <- nls(y ~ cbind(a = 1, b = sin(x + c)), start = list(c=30), alg = "plinear");  ##perform regression in the specified format,to get c(a,b,c)
    yy<-fm$m$getAllPars() ##get the coefficients 
    TDD<-0
    FDD<-0
    for(j in 1:365){
      y<-yy[2]+yy[3]*sin((j*6.28/365)+yy[1])
      if(y>0){
        TDD<-TDD+y}
      else{
        FDD<-FDD+y}
    }
    FDD_MAT[i]<-FDD 
    TDD_MAT[i]<-TDD
  }
}

There is documentation on converting a nested loop
Nesting foreach loops (r-project.org)

In addition, If I use %dopar% after modifying your example above, it will prompt Error in { : task 1 failed - Incorrect number of Dimensions. If, in the code I substitute %do% for %dopar%, it runs correctly, Why is this? and when I view its use time by system time, I found %do% has no significant improvement over for loops, the difference between them is tiny

TDD_MAT<-matrix(c(1),nrow = 100,ncol = 100)
FDD_MAT<-matrix(c(1),nrow = 100,ncol = 100)
ss <- foreach(i=1:10000,.combine = "rbind",.inorder = TRUE)%do%{
  if(any(is.na(MAT_2015[i,]))){
    FDD_MAT<-NA
    TDD_MAT<-NA
  }else{
    day<-doy
    x<-c(day*6.28/365)
    y<-MAT_2015[i,]
    fm <- nls(y ~ cbind(a = 1, b = sin(x + c)), start = list(c=30), alg = "plinear");  
    yy<-fm$m$getAllPars() 
    ##curve(yy[2]+yy[3]*sin((6.28*x/365)+yy[1]),0,365)
    TDD<-0
    FDD<-0
    for(j in 1:365){
      y<-yy[2]+yy[3]*sin((j*6.28/365)+yy[1])
      if(y>0){
        TDD<-TDD+y}
      else{
        FDD<-FDD+y}
    }
    FDD_MAT<-FDD
    TDD_MAT<-TDD
  }
  return(c(FDD_MAT,TDD_MAT))
}

unfortunately I can't comment (and I would expect it would be difficult for others) because this is not a reprex.

although it seems you haven't used %:% which is the main lesson in the document I attached for you to study

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.