 # Improve performance of for-loop

Hi All,
I have a large dataset which I need to run in a similar way to the code below.
Basically, I am producing 5 values for each df. The code runs 50 times for the given dataset and the final output is its mean of 50 loops of each row. Usually, it took me 24x7 hours to complete a dataset.
My problem is how can I enhance the speed of such loops? or any other suggestions.

``````df_final=rep(NA,length=25)
for (k in 1:50){
df=c(25, 100,  52,  11,  43)

dat=NULL
for (j in 1:length(df)){

df1=df[j]
u1=runif(1,0,ppois(df1,lambda = mean(df)))
d1=qpois(u1,lambda=mean(df))
d2=ifelse(d1<df1,
qpois(runif(1,0,ppois((df1-d1),lambda = mean(df))),lambda=mean(df)),0 )
d3=ifelse((d1+d2)<df1,
qpois(runif(1,0,ppois((df1-d1-d2),lambda = mean(df))),lambda=mean(df)),0)
d4=ifelse((d1+d2+d3)<df1,
qpois(runif(1,0,ppois((df1-d1-d2-d3),lambda = mean(df))),lambda=mean(df)),0)
d5=df1-(d1+d2+d3+d4)
dat=rbind(dat,d1,d2,d3,d4,d5)
}
df_final=cbind(df_final,dat)
}
df_final=as.data.frame(df_final)
final=rowMeans(df_final[,2:51])

``````

Hi @meitei,
One way to enhance the speed of loops or functions is to remove "growing" arrays by rbind, or cbind. From your dataset, I renamed df to mydf as df is an already named function in R, and within your j loop, mean(df) will always be the same, so I removed that outside the function as it only needs to be calculated once and renamed it mean_df.

I then pass each element of mydf, and mean_df in parallel via `mapply`.

``````    mydf <- c(25, 100,  52,  11,  43)

mean_df <- mean(mydf)

# Calculate the random values. Inputs are: x = each element of mydf, and
# y = mean of mydf
# Returns a named vector d1, d2, d3, d4, d5 for each element of mydf and mean_df

rand_fn <- function(x, y){
u1 <- runif(1, 0, ppois(x, lambda = y))
d1 <- qpois(u1, lambda = y)
d2 <- ifelse(d1 < x, qpois(runif(1,0, ppois((x-d1),lambda = y)),
lambda = y),0 )
d3 <- ifelse((d1+d2)<x, qpois(runif(1,0,ppois((x-d1-d2),lambda = y)),
lambda = y),0)
d4 <- ifelse((d1+d2+d3)<x,
qpois(runif(1,0,ppois((x-d1-d2-d3),lambda = y)),lambda = y),0)
d5 <- x - (d1 + d2 + d3 + d4)
l <- c(d1 = d1, d2 = d2, d3 = d3, d4 = d4, d5 = d5)
}

``````

Here, mapply will loop through each element of mydf and mean_df in parallel, and will by default, simplify the result to a matrix. Because mean_df is a single number, it will be recycled the same length as mydf. I transposed the resulting matrix (t) in order for d1 to d5 appear as columns, and each element of mydf or df as rows.

Replicate basically allows you to repeat the same function over and over
again, and combine the results. kreps is a 3D array with each element of mydf
as rows, each column corresponding to d1 to d5, and each slice as one of 50
replicates.

``````    kreps <- replicate(50, t(mapply(function(x, y) rand_fn(x, y),
mydf, mean_df)))

kreps[, , 1:2] # Two slices of the results as example
#> , , 1
#>
#>      d1 d2 d3 d4 d5
#> [1,] 25  0  0  0  0
#> [2,] 54 46  0  0  0
#> [3,] 39 12  1  0  0
#> [4,] 10  1  0  0  0
#> [5,] 42  1  0  0  0
#>
#> , , 2
#>
#>      d1 d2 d3 d4 d5
#> [1,] 24  1  0  0  0
#> [2,] 51 43  6  0  0
#> [3,] 48  4  0  0  0
#> [4,] 11  0  0  0  0
#> [5,] 41  2  0  0  0
``````

Because kreps is a 3D array, I can calculate the mean across the slices for
each element by specifying the margins.

``````    final_result <- apply(kreps, MARGIN = c(1, 2), FUN = mean)

final_result
#>         d1    d2   d3   d4   d5
#> [1,] 23.56  1.44 0.00 0.00 0.00
#> [2,] 47.52 43.28 8.86 0.32 0.02
#> [3,] 43.84  8.02 0.12 0.02 0.00
#> [4,] 10.66  0.34 0.00 0.00 0.00
#> [5,] 39.04  3.82 0.14 0.00 0.00
``````

Created on 2020-10-22 by the reprex package (v0.3.0)

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

@ jrmuirhead thank you so much.