How to use lapply with custom function

Uday · December 27, 2018, 10:37pm

All,
I have an excel template and I would like to edit the data in the template. I am able to do it with the loops construct, but I know loops are inefficient. So, I am trying to use the "apply" family functions and could use some help. Here is some sample code :

suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(openxlsx))

TemplateFilePath <- paste('C:', 'Test1.xlsx', sep = '\\')

# Get Handle to Template file 
wb <- loadWorkbook(TemplateFilePath)

SpecialRows <- list(30,31,32,33)
SummaryData <- c(sum(1:10), sum(2:20), sum(3:30), sum(4:40))
WbObjectList <- list(rep('wb', length(SummaryData)))
SheetNamesList <- list(rep('Test'), length(SummaryData))
ColsList <- list(rep(1), length(SummaryData))

Res3 <- lapply(1:length(SummaryData),
                function(i)
                {
                    writeData(WbObjectList[i], SheetNamesList[i], x = (SummaryData[[i]]), startCol = ColsList[i], startRow = SpecialRows[i])
                    addStyle(WbObjectList[i], SheetNamesList[i], styleT, cols = ColsList[i], rows = SpecialRows[i])
                }
              )

Please note that the functions writeData an addstyle are from the openxlsx package

This results in the following error:

Error in writeData(WbObjectList[i], SheetNamesList[i], x = (SummaryData[[i]]), :
(list) object cannot be coerced to type 'integer'

Best
uday

technocrat · December 27, 2018, 11:21pm

Can you provide

str(SummaryData[[i]])

for some value of i, please?

hoelk · December 28, 2018, 12:03am

From quickly looking at your code, shouldn't startCol be an integer vector, not a list? I think that is the issue for the error message.

For what you are doing lapply() has no advantage over a for loop. *apply functions are not more efficient than loops in R, their advantage is that their output is more predictable (if you are using them correctly).

Uday · December 28, 2018, 3:18am

@technocrat,

> str(SummaryData[[1]])
 int 55

technocrat · December 28, 2018, 4:03am

So, what you have there is an integer and, of course, it doesn't need to be coerced to an integer, because it already is one, your function is iterating over a list of integers, so SummaryData[[i] isn't responsible.

writeData 's sheet argument accepts either a tab name or number, so it doesn't have to be coerced.

That leaves WbObjectList as the suspect.

i <- 1
 WbObjectList[i]
[[1]]
[1] "wb" "wb" "wb" "wb"

str(WbObjectList[i])
List of 1
 $ : chr [1:4] "wb" "wb" "wb" "wb"
class(WbObjectList[i])
[1] "list"
writeData(WbObjectList[i] )
Error in writeData(WbObjectList[i]) : 
  argument "x" is missing, with no default

meaning that writeData was expecting a workbook object containing a data sheet and got a list, instead

Now we could try to get around this

WbObjectList[[1]][i]
[1] "wb"

but we get a character object, not a workbook object, which is because

WbObjectList <- list(rep('wb', length(SummaryData)))

repeats the string "wb" 4 times, not wb as defined above

What happens when we change the definition of WbObjectList?

WbObjectList <- list(rep(wb, length(SummaryData))) # without the single quotes around wb

I can't test that because I don't have any xlsx files, but why don't you try and report back?

Uday · December 28, 2018, 2:03pm

@technocrat,
Thank you for the kind and detailed breakdown. Here is an update:
The following code works. However, one thing I don't understand is when I run this code, there is a ton of numbers being printed to my screen, I wonder why that is happening. Also, I am confused as to why the apply function would not be any faster than the loop construct. Would definitely love to understand that

SpecialRows <- list(30,31,32,33)
SummaryData <- c(sum(1:10), sum(2:20), sum(3:30), sum(4:40))
ColVals<- rep(1,length(SummaryData))

Res3 <- lapply(1:length(SummaryData),
                function(i)
                {
                    writeData(wb, 'Year 1', x = (SummaryData[[i]]), startCol = ColVals[i], startRow = SpecialRows[i])
                    addStyle(wb, 'Year 1', styleT, cols = ColVals[i], rows = SpecialRows[i])
                }
              )

hoelk · December 28, 2018, 5:14pm

Loops in R come with a certain overhead (compared to more low level programming languages like C). lapply() and co just hide the loop and do some magic around it. There are functions that are truely vectorized that are much faster because the underlying loops written in C.

If you have a function like yours, it does not really matter which kind of loop you choose. The computations you perform inside the body (your writeData and addStyle) take MUCH more time than the looping overhead. If you are iterating over 10s of thousands of elements, you have to start thinking.

If you see a lapply(x, add_one) you instantly know "oh this line of code returns a list of the same length as x, probably it just adds 1 to each element", if you see a for loop you just know that something happens, and you have to read and understand the loop in detail. Keeping code easy to understand is usually much more valuable than to squeezing out every last millisecond.

Also, never trust people that tell you something about performance. Benchmark it yourself:

library(bench)

x <- 1:1000000

good_loop <- function(x){
  # we preallocate the result vector
  y <- vector("numeric", length(x))   
  for (i in x){
    y[[i]] <- x[[i]] + 1 
  }
  y
}


bad_loop <- function(x){
  # we create a vector of length 0, and grow it every iteration
  y <- numeric()
  for (i in x){
    y[[i]] <- x[[i]] + 1 
  }
  y
}


mark(
  native_vectorization = x + 1,
  good_loop = good_loop(x),
  bad_loop  = bad_loop(x),
  vapply    = vapply(x, function(.x) .x + 1, numeric(1)),
  lapply    = unlist(lapply(x, function(.x) .x + 1))
)

  expression                min     mean   median      max `itr/sec` mem_alloc
  <chr>                <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt>
1 native_vectorization   1.55ms   5.52ms   3.92ms   97.8ms    181.      7.63MB
2 good_loop             98.89ms 101.42ms  100.5ms  105.8ms      9.86    7.63MB
3 bad_loop              295.6ms 309.45ms 309.45ms  323.3ms      3.23  164.05MB
4 vapply               584.52ms 584.52ms 584.52ms  584.5ms      1.71    7.63MB
5 lapply               790.83ms 790.83ms 790.83ms  790.8ms      1.26   15.26MB

I was surprised that even the bad_loop is faster than lapply()/vapply(). Usually, looping without preallocation sucks in R (and other languages). One advantage of *applys is that they take care of that for you. Maybe its because the code is to simple.

system · January 4, 2019, 5:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.