Tidiest way to do recursion safely in R?

purrr
tidyverse

#1

Hey All,
I’ve had two problems in the last few months where the easiest solution was to write a recursive function. One was determining recursive dependencies of Github packages by following package “remotes”, the other was downloading an entire folder (with directory structure intact) from Google Drive.

In both cases I implemented recursion unsafely without guarding against stack overflow. I justified this based on the nature of the data sources I was recursively exploring.

It sure would feel better to be able to use this pattern safely! But I find myself tripping at the first hurdles i.e: Suppose I move to a loop/stack implementation. Which package provides the best stack (or queue) data structure (if any!)?

Are there any functions hidden in the tidyverse that might help?

Who has done it, using what, and how did it go?


#2

I’m an R neophyte, so I can’t speak at all to “the tidyverse way” or even “the right way” to do this.

However, my background is in Lisp. In that world, one well-known technique for expressing recursion is the trampoline. It’s a mechanism for expressing iteration as recursion. In R, it might look something like this:

trampoline <- function(f) {
  function(...) {
    ret <- do.call(f, list(...))
    while (is.list(ret) && attr(ret, "args")) {
      ret <- do.call(f, ret)
    }
    ret
  }
}

recur <- function(...) {
  args <- list(...)
  attr(args, "args") <- TRUE
  args
}

fact <- trampoline(function(n, prod = 1) {
  if (n == 0) {
    prod
  } else {
    recur(n-1, prod*n)
  }
})

fact(5)

As long as recur is called in “tail position” – after any other code in the function body – the recursion can convert to iteration without stack growth.

I’m aware of a Recall function in R, that on the surface looks something like recur, but I believe it consumes stack. It does seem useful as a way to refer anonymously to the enclosing function, though.

Like I said, I’m new to R, so as much as I enjoy this approach, there may very well be other ways. I look forward to reading other thoughts.


#3

Can, but searches indicate that R won’t. You’ll still grow the stack. I’m not aware of a solution, but I certainly don’t know the ins and outs of R like others around here.


#4

@alandipert This is very educational thankyou so much for taking the time to post. What’s more is that the trampoline appears to work in R!

I fully expected it not to… since things like Recall and on.exit all consume stack. My tests proved me wrong though:

fib_tramp <- trampoline(function(num_seq = c(0,1), max_num = 1000){
  num_vals <- length(num_seq)
  if(num_vals == max_num){
    return(num_seq)
  }
  new_val = num_seq[num_vals] + num_seq[num_vals-1]
  recur(c(num_seq, new_val), max_num)
})

fib <- function(num_seq = c(0,1), max_num = 1000){
  num_vals <- length(num_seq)
  if(num_vals == max_num){
    return(num_seq)
  }
  new_val = num_seq[num_vals] + num_seq[num_vals-1]
  fib(c(num_seq, new_val), max_num)
}

fib(max_num = 10000)
# Error: C stack usage  7970164 is too close to the limit

fib_tramp(max_num = 100000)
#WORKS!

Also this seems like it could be packaged up nicely into a reusable tool!

edit: sorry @alandipert I originally attributed the trampoline to @nick, in my exictement


#5

Apologies, I read through the code too quickly when I looked at it and should have investigated more closely before spouting my mouth (fingers?) off :zipper_mouth_face:. Very helpful code, @alandipert !

Now that I understand what’s happening, I get why it works. It’s not counting on tail recursion optimization – in fact, if you try to use recur someplace other than as a return value, the problem won’t be the stack, but that it won’t work at all!


#6

Alan, awesome! Here is an alternative implementation that avoids the do.call overhead and returns the recursive items in a simple S3 class, which makes the while() condition a little cleaner IMHO.

trampoline <- function(f) {
  function(...) {
    ret <- f(...)
    while (inherits(ret, "recursion")) {
      ret <- eval(as.call(c(f, unclass(ret))))
    }
    ret
  }
}

recur <- function(...) {
  structure(list(...), class = "recursion")
}

#7

@milesmcbain My pleasure to share, reminds me of my own excitement when I first learned about this stuff! I can definitely imagine a library. If you’re so moved, feel free to use my code.

@jimhester that’s nicer, thank you!

I thought to also mention that the version I shared is somewhat specialized compared to the traditional trampoline, which usually looks something like this:

trampoline <- function(f, ...) {
  ret <- do.call(f, list(...))
  while (is.function(ret)) ret <- ret()
  ret
}

Here, “intent to recur” is indicated not by a list with a special attribute, but by a function. The function can contain any code to call on the next iteration of the loop, including code to call a different function. This allows for mutual recursion as demonstrated by even and odd 1:

even <- function(n) {
  if (n == 0) TRUE else function() odd(n-1)
}

odd <- function(n) {
  if (n == 0) FALSE else function() even(n-1)
}

trampoline(odd, 1000001)

I tend not to do it this way myself because I don’t run across mutual recursion often in workaday programming, and because the recur way is usually more concise. Parsers are a kind of program where it might be useful, but I don’t write many of those.

Of course, maybe a cool library could provide both :smile:


#8

This discussion inspired the first post on my new R blog! I hope you like it: https://tailrecursion.com/wondr/posts/tail-recursion-in-r.html


#9

I had seen the word trampoline thrown around in functional programming context, but it never sunk in. Like, what the heck is Wikipedia talking about?

a trampoline is a loop that iteratively invokes thunk-returning functions (continuation-passing style).

It’s great to have the concept expressed in R. Now, I get it. Awesome thread.


#10

Nice blog post @alandipert! This is the kind of thing I can seem myself using a reference link for others in the future.

Based on what I have gleaned from your solution and your blog post, I have arrived at a conclusion I want to check:

My original usecases that I mentioned way back are effectively graph exploration problems, e.g: https://github.com/MilesMcBain/mmmisc/blob/master/R/utils.R#L208

Instead of recursing multiple times within one invocation (e.g. on all child-nodes), I should create an accumulator to hold a node list. Therefore irrespective of whether I solve this with a loop or recursive calls, a stack-type data structure is probably going to be involved as the accumulator.


#11

@milesmcbain that sounds good to me. I’ve heard this approach called “reify the stack”.

I don’t think it matters to you in this case, but you can also choose to use a queue instead of a stack as your accumulator. Doing so would make traversal breadth-first instead of depth-first.


#12

Hello. I’ve just come across this thread. I should have joined this conversation at that time, but I hope my post on GitHub would help the understanding of the relationship between tail recursion and CPS transformation in R. https://github.com/TobCap/R/blob/master/tail-recursion-elimination.r
Thanks.


#13

Saw @milesmcbain tweet this in March. Wanted to add it to the discussion: