Make this code execute faster

Hello Everyone,

I am executing a for loop code for 100k+ rows and 50+ columns to apply a conditional formula across all element of the dataframe. Is there a logical way to make the code run faster

for(i in 1:nrow(dataframe)){
for(j in 4:column_number){
if(j <= min(dataframe$index[i] +6,column_number)){
dataframe[i,j] <- round(dataframe[i,j]*dataframe[i,col_12],0)
}else{
dataframe[i,j] <- 0
}
}
}

Thanks

Can you provide a reproducible example of what this dataframe might look like?

I would suggest to explore using the foreach package registered against a parallel backend (e.g. doMC on a multi core PC or doMPI for distributed computing on an HPC cluster). Foreach will already give you a performance boost compared to ordinary for loops. After that you should see an almost linear increase in performance the more cores you use (twice the cores, twice as fast).

From a code perspective I would be careful to make sure you are not overwriting elements of dataframe[i,j] while running the loop (reason is that in the if clause you depend on dataframe[i,col_12]). After all your input variable is the same as the output.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.