<Sapply> instead of <for>

I have the following code and want to write it with "apply" family of functions in order to decrease the time consumption. And are there another things that can be changed to optimize the operation?

ord_db is a tibble with several millions of rows.

for(i in 1:nrow(ord_db)){
n <- ord_db[i, "Date"]
if(n == 43465){
z <- 0
}else if (n == 43466 | n == 43831 | n == 44196){
z <- ord_db$Value[i]
}else {
z <- ord_db$Value[i] - ord_db$Prev_Date_Value[i]
}
ord_db$Value_change[I] <- z
}

here is an approach using tidyverse that doesnt involve for loops, or apply familt functions

library(tidyverse)
#use built in population dataset as example
(small_df <-  head(population,10))

(new_df <- mutate(small_df,
                 z= case_when(year==1999 ~ 0,
                              year %in% c(1995,1996) ~ as.numeric(population),#needs to be double not integer to match the rest
                              TRUE ~  sqrt(population/100)
                 )))

Thanks . I wanted to use "sapply" because I read that it is faster and more efficient. Do you know your suggested method is faster than "sapply"? Could you write this with "sapply"?

I dont think sapply is a good choice because your conditions mix multiple variables as source, and if you are going to access them , its going to be best to do so using the standard base style vectorisation.
Here is a comparison

library(tidyverse)
#use built in population dataset as example
(small_df <-  head(population,10))

library(microbenchmark)

microbenchmark(
    tidy = {
        (new_df <- mutate(small_df,
                          z= case_when(year==1999 ~ 0,
                                       year %in% c(1995,1996) ~ as.numeric(population),#needs to be double not integer to match the rest
                                       TRUE ~  sqrt(population/100)
                          )))
    },
    base = {
        new_df2<-small_df
        new_df2$z <- ifelse(new_df2$year==1990,0,
                            ifelse( new_df2$year %in% c(1995,1996) , new_df2$population,
                                    sqrt(new_df2$population/100)))
        new_df2
    }
)

Unit: microseconds
 expr     min       lq     mean   median       uq     max neval
 tidy 185.100 192.8005 221.0891 200.8005 214.2520 483.302   100
 base  92.801 103.9010 122.0940 115.8015 120.4515 334.702   100

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.