Hey All,
I've had two problems in the last few months where the easiest solution was to write a recursive function. One was determining recursive dependencies of Github packages by following package "remotes", the other was downloading an entire folder (with directory structure intact) from Google Drive.
In both cases I implemented recursion unsafely without guarding against stack overflow. I justified this based on the nature of the data sources I was recursively exploring.
It sure would feel better to be able to use this pattern safely! But I find myself tripping at the first hurdles i.e: Suppose I move to a loop/stack implementation. Which package provides the best stack (or queue) data structure (if any!)?
Are there any functions hidden in the tidyverse that might help?
I'm an R neophyte, so I can't speak at all to "the tidyverse way" or even "the right way" to do this.
However, my background is in Lisp. In that world, one well-known technique for expressing recursion is the trampoline. It's a mechanism for expressing iteration as recursion. In R, it might look something like this:
As long as recur is called in "tail position" -- after any other code in the function body -- the recursion can convert to iteration without stack growth.
I'm aware of a Recall function in R, that on the surface looks something like recur, but I believe it consumes stack. It does seem useful as a way to refer anonymously to the enclosing function, though.
Like I said, I'm new to R, so as much as I enjoy this approach, there may very well be other ways. I look forward to reading other thoughts.
Can, but searches indicate that R won't. You'll still grow the stack. I'm not aware of a solution, but I certainly don't know the ins and outs of R like others around here.
Apologies, I read through the code too quickly when I looked at it and should have investigated more closely before spouting my mouth (fingers?) off . Very helpful code, @alandipert !
Now that I understand what's happening, I get why it works. It's not counting on tail recursion optimization -- in fact, if you try to use recur someplace other than as a return value, the problem won't be the stack, but that it won't work at all!
Alan, awesome! Here is an alternative implementation that avoids the do.call overhead and returns the recursive items in a simple S3 class, which makes the while() condition a little cleaner IMHO.
trampoline <- function(f) {
function(...) {
ret <- f(...)
while (inherits(ret, "recursion")) {
ret <- eval(as.call(c(f, unclass(ret))))
}
ret
}
}
recur <- function(...) {
structure(list(...), class = "recursion")
}
@milesmcbain My pleasure to share, reminds me of my own excitement when I first learned about this stuff! I can definitely imagine a library. If you're so moved, feel free to use my code.
I thought to also mention that the version I shared is somewhat specialized compared to the traditional trampoline, which usually looks something like this:
trampoline <- function(f, ...) {
ret <- do.call(f, list(...))
while (is.function(ret)) ret <- ret()
ret
}
Here, "intent to recur" is indicated not by a list with a special attribute, but by a function. The function can contain any code to call on the next iteration of the loop, including code to call a different function. This allows for mutual recursion as demonstrated by even and odd1:
even <- function(n) {
if (n == 0) TRUE else function() odd(n-1)
}
odd <- function(n) {
if (n == 0) FALSE else function() even(n-1)
}
trampoline(odd, 1000001)
I tend not to do it this way myself because I don't run across mutual recursion often in workaday programming, and because the recur way is usually more concise. Parsers are a kind of program where it might be useful, but I don't write many of those.
Of course, maybe a cool library could provide both
Instead of recursing multiple times within one invocation (e.g. on all child-nodes), I should create an accumulator to hold a node list. Therefore irrespective of whether I solve this with a loop or recursive calls, a stack-type data structure is probably going to be involved as the accumulator.
@milesmcbain that sounds good to me. I've heard this approach called "reify the stack".
I don't think it matters to you in this case, but you can also choose to use a queue instead of a stack as your accumulator. Doing so would make traversal breadth-first instead of depth-first.
Hello. I've just come across this thread. I should have joined this conversation at that time, but I hope my post on GitHub would help the understanding of the relationship between tail recursion and CPS transformation in R. https://github.com/TobCap/R/blob/master/tail-recursion-elimination.r
Thanks.
Without using any of these two versions of trampolin, using straight explicit recursion fills up the stack and I cannot do more than 5 effective digits (that is not counting 0).
Then I used the original version from alandipert in my function to_vector_of_strings_1 and the second version without the do.call in my function to_vector_of_strings_2.
I compared the execution time of the two versions of the trampoline and here are the results:
> s_to_parse <- "4039829222"
> microbenchmark(to_vector_of_strings_1(s_to_parse),times = 10L)
Unit: seconds
expr min lq mean median uq max neval
to_vector_of_strings_1(s_to_parse) 10.55397 11.845 12.42362 11.89739 12.22428 15.9142 10
> microbenchmark(to_vector_of_strings_2(s_to_parse),times = 10L)
Unit: seconds
expr min lq mean median uq max neval
to_vector_of_strings_2(s_to_parse) 11.90121 12.12616 12.38451 12.37848 12.56244 13.07632 10
The change to using an S3 class and avoiding the do.call does not seem to provide a meaningful improvement in execution time as reported by microbench.
Both versions work well in R, I am using RStudio Version 1.1.463 and R version 3.4.4:
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 4.4
year 2018
month 03
day 15
svn rev 74408
language R
version.string R version 3.4.4 (2018-03-15)
nickname Someone to Lean On