Tidiest way to do recursion safely in R?

milesmcbain · September 28, 2017, 2:56am

Hey All,
I've had two problems in the last few months where the easiest solution was to write a recursive function. One was determining recursive dependencies of Github packages by following package "remotes", the other was downloading an entire folder (with directory structure intact) from Google Drive.

In both cases I implemented recursion unsafely without guarding against stack overflow. I justified this based on the nature of the data sources I was recursively exploring.

It sure would feel better to be able to use this pattern safely! But I find myself tripping at the first hurdles i.e: Suppose I move to a loop/stack implementation. Which package provides the best stack (or queue) data structure (if any!)?

Are there any functions hidden in the tidyverse that might help?

Who has done it, using what, and how did it go?

alandipert · September 28, 2017, 4:27pm

I'm an R neophyte, so I can't speak at all to "the tidyverse way" or even "the right way" to do this.

However, my background is in Lisp. In that world, one well-known technique for expressing recursion is the trampoline. It's a mechanism for expressing iteration as recursion. In R, it might look something like this:

trampoline <- function(f) {
  function(...) {
    ret <- do.call(f, list(...))
    while (is.list(ret) && attr(ret, "args")) {
      ret <- do.call(f, ret)
    }
    ret
  }
}

recur <- function(...) {
  args <- list(...)
  attr(args, "args") <- TRUE
  args
}

fact <- trampoline(function(n, prod = 1) {
  if (n == 0) {
    prod
  } else {
    recur(n-1, prod*n)
  }
})

fact(5)

As long as recur is called in "tail position" -- after any other code in the function body -- the recursion can convert to iteration without stack growth.

I'm aware of a Recall function in R, that on the surface looks something like recur, but I believe it consumes stack. It does seem useful as a way to refer anonymously to the enclosing function, though.

Like I said, I'm new to R, so as much as I enjoy this approach, there may very well be other ways. I look forward to reading other thoughts.

nick · September 28, 2017, 6:31pm

Can, but searches indicate that R won't. You'll still grow the stack. I'm not aware of a solution, but I certainly don't know the ins and outs of R like others around here.

milesmcbain · September 29, 2017, 5:20am

@alandipert This is very educational thankyou so much for taking the time to post. What's more is that the trampoline appears to work in R!

I fully expected it not to... since things like Recall and on.exit all consume stack. My tests proved me wrong though:

fib_tramp <- trampoline(function(num_seq = c(0,1), max_num = 1000){
  num_vals <- length(num_seq)
  if(num_vals == max_num){
    return(num_seq)
  }
  new_val = num_seq[num_vals] + num_seq[num_vals-1]
  recur(c(num_seq, new_val), max_num)
})

fib <- function(num_seq = c(0,1), max_num = 1000){
  num_vals <- length(num_seq)
  if(num_vals == max_num){
    return(num_seq)
  }
  new_val = num_seq[num_vals] + num_seq[num_vals-1]
  fib(c(num_seq, new_val), max_num)
}

fib(max_num = 10000)
# Error: C stack usage  7970164 is too close to the limit

fib_tramp(max_num = 100000)
#WORKS!

Also this seems like it could be packaged up nicely into a reusable tool!

edit: sorry @alandipert I originally attributed the trampoline to @nick, in my exictement

nick · September 29, 2017, 11:59am

Apologies, I read through the code too quickly when I looked at it and should have investigated more closely before spouting my mouth (fingers?) off . Very helpful code, @alandipert !

Now that I understand what's happening, I get why it works. It's not counting on tail recursion optimization -- in fact, if you try to use recur someplace other than as a return value, the problem won't be the stack, but that it won't work at all!

jimhester · October 2, 2017, 7:10pm

Alan, awesome! Here is an alternative implementation that avoids the do.call overhead and returns the recursive items in a simple S3 class, which makes the while() condition a little cleaner IMHO.

trampoline <- function(f) {
  function(...) {
    ret <- f(...)
    while (inherits(ret, "recursion")) {
      ret <- eval(as.call(c(f, unclass(ret))))
    }
    ret
  }
}

recur <- function(...) {
  structure(list(...), class = "recursion")
}

alandipert · October 2, 2017, 7:37pm

@milesmcbain My pleasure to share, reminds me of my own excitement when I first learned about this stuff! I can definitely imagine a library. If you're so moved, feel free to use my code.

@jimhester that's nicer, thank you!

I thought to also mention that the version I shared is somewhat specialized compared to the traditional trampoline, which usually looks something like this:

trampoline <- function(f, ...) {
  ret <- do.call(f, list(...))
  while (is.function(ret)) ret <- ret()
  ret
}

Here, "intent to recur" is indicated not by a list with a special attribute, but by a function. The function can contain any code to call on the next iteration of the loop, including code to call a different function. This allows for mutual recursion as demonstrated by even and odd ¹:

even <- function(n) {
  if (n == 0) TRUE else function() odd(n-1)
}

odd <- function(n) {
  if (n == 0) FALSE else function() even(n-1)
}

trampoline(odd, 1000001)

I tend not to do it this way myself because I don't run across mutual recursion often in workaday programming, and because the recur way is usually more concise. Parsers are a kind of program where it might be useful, but I don't write many of those.

Of course, maybe a cool library could provide both

alandipert · October 13, 2017, 5:30am

This discussion inspired the first post on my new R blog! I hope you like it: https://tailrecursion.com/wondr/posts/tail-recursion-in-r.html

tjmahr · October 13, 2017, 1:40pm

I had seen the word trampoline thrown around in functional programming context, but it never sunk in. Like, what the heck is Wikipedia talking about?

a trampoline is a loop that iteratively invokes thunk-returning functions (continuation-passing style).

It's great to have the concept expressed in R. Now, I get it. Awesome thread.

milesmcbain · October 21, 2017, 2:32am

Nice blog post @alandipert! This is the kind of thing I can seem myself using a reference link for others in the future.

Based on what I have gleaned from your solution and your blog post, I have arrived at a conclusion I want to check:

My original usecases that I mentioned way back are effectively graph exploration problems, e.g: https://github.com/MilesMcBain/mmmisc/blob/master/R/utils.R#L208

Instead of recursing multiple times within one invocation (e.g. on all child-nodes), I should create an accumulator to hold a node list. Therefore irrespective of whether I solve this with a loop or recursive calls, a stack-type data structure is probably going to be involved as the accumulator.

alandipert · October 24, 2017, 3:19pm

@milesmcbain that sounds good to me. I've heard this approach called "reify the stack".

I don't think it matters to you in this case, but you can also choose to use a queue instead of a stack as your accumulator. Doing so would make traversal breadth-first instead of depth-first.

tob · February 7, 2018, 1:26pm

Hello. I've just come across this thread. I should have joined this conversation at that time, but I hope my post on GitHub would help the understanding of the relationship between tail recursion and CPS transformation in R. https://github.com/TobCap/R/blob/master/tail-recursion-elimination.r
Thanks.

cole · March 17, 2018, 2:32am

Saw @milesmcbain tweet this in March. Wanted to add it to the discussion:

padames · May 15, 2019, 9:39pm

I wrote a tail-end recursive function to compute all the possible words that can be composed with a 10-digit phone number.

The mapping I used from digit to letter is:

digit_to_characters <- c("1" = "", "2" = "abc", "3" = "def", "4" = "ghi",
                         "5" = "jkl", "6" = "mno", "7" = "pqrs",
                         "8" = "tuv", "9" = "wxyz", "0" = "")

Without using any of these two versions of trampolin, using straight explicit recursion fills up the stack and I cannot do more than 5 effective digits (that is not counting 0).

Then I used the original version from alandipert in my function to_vector_of_strings_1 and the second version without the do.call in my function to_vector_of_strings_2.

I compared the execution time of the two versions of the trampoline and here are the results:

> s_to_parse <- "4039829222"
> microbenchmark(to_vector_of_strings_1(s_to_parse),times = 10L)
Unit: seconds
                               expr      min     lq     mean   median       uq     max neval
 to_vector_of_strings_1(s_to_parse) 10.55397 11.845 12.42362 11.89739 12.22428 15.9142    10
> microbenchmark(to_vector_of_strings_2(s_to_parse),times = 10L)
Unit: seconds
                               expr      min       lq     mean   median       uq      max neval
 to_vector_of_strings_2(s_to_parse) 11.90121 12.12616 12.38451 12.37848 12.56244 13.07632    10

The change to using an S3 class and avoiding the do.call does not seem to provide a meaningful improvement in execution time as reported by microbench.

Both versions work well in R, I am using RStudio Version 1.1.463 and R version 3.4.4:

> version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          4.4                         
year           2018                        
month          03                          
day            15                          
svn rev        74408                       
language       R                           
version.string R version 3.4.4 (2018-03-15)
nickname       Someone to Lean On

system · September 17, 2019, 12:11am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.