What is the tidyverse way of doing a conditional `while` or `for` loop?

I'm paginating through an API and looking for something like map_while, but I cannot find the appropriate function in purrr. map_if is a no-go because I need to know the result of the computation during the loop.

Here are the two patterns I'm curious to replace:

Pattern 1: for loop + break

for (i in seq_along(x)) {
    ## do something ## 
    if (something) break
}

Pattern 2: while loop

cond <- TRUE
while(cond) {
    ## do something ## 
    if (something) {
        cond <- FALSE
    }
}
3 Likes

Recursion would be a functional-style alternative. Search for "functional alternative while" to get some inspiration.

Practically, I would use a small loop only to collect the paginated data (i.e. don't do more than absolutely necessary inside the loop's body), and then process it purrr- or dplyr-style.

2 Likes

The advantage of*apply()/map*() style approaches are semantic ones: When you see a lapply() you know that

  • Your code will return a list
  • it won't do anything else (no side effects)
  • each iteration is Independent of the others

In an if/while loop anything can happen und you have to read and understand the loop to know what's going on really.

Now you already know the criteria I lised for apply() arent true for your problem, so that approach is not the right one. Sometimes its perfectly fine to use a for loop. For a for loop, you at least know the exact number of iterations. For a while loop you don't even know that. I would use those constructs in that order:

  • use *apply/map* if the problem fits the criteria describes aboved
  • use for if you know the number of iterations. You might use for with break if you know the maximum number of iterations but the code might exit early
  • use while if you even don't know that
  • use recursion as @klmr suggested if the problem can be elegantly solved this way (but be aware that for large problems, recursion has disadvantages over loops)
    .
4 Likes

A while back I had the same problem (paginating through an API) and I ended up applying a repeat( // break()) style loop.

The reason was that it was impossible to know in advance the number of pages with meaningful result, so a for() loop was not feasible and I had to keep iterating until finding an empty page.

This was the (simplified) construction:

i <- 1
result <- data.frame()

repeat({
  data <- scrape_the_api(page = i)
  
  if(nrow(data)==0) break() # the end was reached...

  result <- rbind(result, data)
  i <- i +1
})

some_fancy_processing(result)

This construction makes sure the API is called at least once, and breaks once an empty result is returned.

7 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.