I'm paginating through an API and looking for something like map_while, but I cannot find the appropriate function in purrr. map_if is a no-go because I need to know the result of the computation during the loop.
Here are the two patterns I'm curious to replace:
Pattern 1: for loop + break
for (i in seq_along(x)) {
## do something ##
if (something) break
}
Pattern 2: while loop
cond <- TRUE
while(cond) {
## do something ##
if (something) {
cond <- FALSE
}
}
Recursion would be a functional-style alternative. Search for "functional alternative while" to get some inspiration.
Practically, I would use a small loop only to collect the paginated data (i.e. don't do more than absolutely necessary inside the loop's body), and then process it purrr- or dplyr-style.
The advantage of*apply()/map*() style approaches are semantic ones: When you see a lapply() you know that
Your code will return a list
it won't do anything else (no side effects)
each iteration is Independent of the others
In an if/while loop anything can happen und you have to read and understand the loop to know what's going on really.
Now you already know the criteria I lised for apply() arent true for your problem, so that approach is not the right one. Sometimes its perfectly fine to use a for loop. For a for loop, you at least know the exact number of iterations. For a while loop you don't even know that. I would use those constructs in that order:
use *apply/map* if the problem fits the criteria describes aboved
use for if you know the number of iterations. You might use for with break if you know the maximum number of iterations but the code might exit early
use while if you even don't know that
use recursion as @klmr suggested if the problem can be elegantly solved this way (but be aware that for large problems, recursion has disadvantages over loops)
.
A while back I had the same problem (paginating through an API) and I ended up applying a repeat( // break()) style loop.
The reason was that it was impossible to know in advance the number of pages with meaningful result, so a for() loop was not feasible and I had to keep iterating until finding an empty page.
This was the (simplified) construction:
i <- 1
result <- data.frame()
repeat({
data <- scrape_the_api(page = i)
if(nrow(data)==0) break() # the end was reached...
result <- rbind(result, data)
i <- i +1
})
some_fancy_processing(result)
This construction makes sure the API is called at least once, and breaks once an empty result is returned.