I have found myself dealing with long, nested, ragged lists (culled from external APIs) and would like to perform some cleaning and strong-typing on them. I'm just about to write a recursive wrapper around purrr::modify_if, but figured I'd ask here first if this is already baked in to purrr and I just haven't managed to figure it out yet.
Here's a trivial example:
x <- list(list(a = as.character(1), b = as.double(2)), c = as.character(3), d = as.double(4))
str(x)
List of 3
$ :List of 2
..$ a: chr "1"
..$ b: num 2
$ c: chr "3"
$ d: num 4
Now I'd like to recursively convert all characters into integers, sort of like so (with a make-believe .recursive=TRUE option):
Hey @mmuurr! It certainly seems like it'd be useful, doesn't it? I'm taking a crack at this problem using modify_depth() instead (and writing the predicate test into the function instead), but it's not quite working
change_recursively = function(y) {
message(y, ' class is ', class(y))
if(is.character(y)) {
as.integer(y)
} else {
y * 1000
}
}
x %>%
modify_depth(vec_depth(x), change_recursively, .ragged = TRUE) %>%
str()
# 1 class is character
# 2 class is numeric
# 3 class is character
# 4 class is numeric
# List of 3
# $ :List of 2
# ..$ a: chr "1"
# ..$ b: num 2000
# $ c:List of 1
# ..$ : chr "3"
# $ d:List of 1
# ..$ : num 4000
It seems like this is correctly classifying all of the elements, but it's only modifying the numeric ones, not the character ones (I threw the y * 1000 in there just to test whether it was that no modification was taking place at all or whether it was just a problem with the character ones.)
This is a little clunky, but if you can help me get the last 10% of the way then it might be a workable solution
With a recursive function, the iteration happens in the function, so there's not much point in iterating them with map or variants. They're also hard to construct without naming the function (so it can be called in itself).
(It is possible to make an anonymous recursive function with Recall, if you really want.)
Here, you need to handle three cases:
The input is character and should be coerced to integer.
The input is a list of length > 1 and should be recursed over.
The input is length 1 but not character, and should be returned without alteration.
The control flow logic can follow that pattern directly:
library(purrr)
x <- list(list(a = as.character(1), b = as.double(2)),
c = as.character(3),
d = as.double(4))
recurse_int <- function(x){
if (is.character(x)) {
as.integer(x)
} else if (length(x) > 1) {
map(x, recurse_int)
} else {
x
}
}
x %>% recurse_int() %>% str()
#> List of 3
#> $ :List of 2
#> ..$ a: int 1
#> ..$ b: num 2
#> $ c: int 3
#> $ d: num 4
I love the idea of iterating over leaf nodes with modify_depth, but I still have yet to get it to do what I want despite lots of tries on lots of problems. Maybe I'm doing it wrong? Here it seems like it should be
x %>% modify_depth(-1, ~if(is.character(.x)) as.integer(.x) else .x)
I got a lot of "Error: list not deep enough" until I added .ragged = TRUE to the call. If looks like default, FALSE, assumes a uniform depth across the tree and throws an error if it finds otherwise.
But yeah, otherwise it would seem to work! I'm not sure why I couldn't get the character elements to change, and I'm not sure if providing a negative depth to get it traverse bottom-up (instead of vec_depth) would give a different result
If we look at how modify_depth is implemented, it is really not very fancy and basically performs modification at a fixed level. The .ragged parameter handles lists with heterogeneous levels as @rensa suggested.
If we really want to use modify_depth we have to iterate through each level:
library(purrr)
x <- list(list(a = as.character(1), b = as.double(2)),
c = as.character(3),
d = as.double(4))
as_integer_recursive <- function(x){
if (is.character(x)) {
as.integer(x)
} else if (length(x) > 1) {
map(x, as_integer_recursive)
} else {
x
}
}
x %>% as_integer_recursive() %>% str()
#> List of 3
#> $ :List of 2
#> ..$ a: int 1
#> ..$ b: num 2
#> $ c: int 3
#> $ d: num 4
as_integer_recursive_map <- function(x) {
x %>%
map_if(is.character, as.integer) %>%
map_if(is.list, as_integer_recursive_map)
}
x %>% as_integer_recursive_map() %>% str()
#> List of 3
#> $ :List of 2
#> ..$ a: int 1
#> ..$ b: num 2
#> $ c: int 3
#> $ d: num 4
as_integer_modify_depth <- function(x) {
reduce(
-seq(vec_depth(x) - 1),
function(x, depth) {
modify_depth(x, depth, ~if(is.character(.x)) as.integer(.x) else .x, .ragged = TRUE)
},
.init = x
)
}
x %>% as_integer_modify_depth() %>% str()
#> List of 3
#> $ :List of 2
#> ..$ a: int 1
#> ..$ b: num 2
#> $ c: int 3
#> $ d: num 4
microbenchmark::microbenchmark(
as_integer_recursive(x),
as_integer_recursive_map(x),
as_integer_modify_depth(x),
times = 100
)
#> Unit: microseconds
#> expr min lq mean median
#> as_integer_recursive(x) 31.026 35.6635 59.97857 40.9725
#> as_integer_recursive_map(x) 1898.626 2091.3115 3539.13028 2688.1005
#> as_integer_modify_depth(x) 386.974 422.1990 694.83505 470.1635
#> uq max neval
#> 65.128 277.072 100
#> 3976.983 11519.629 100
#> 681.178 3770.032 100
Calling the functions on such a small list, you're mostly capturing function overhead with the benchmark. If speed matters, the benchmark would have to be on a much larger list. The problem is that it's sort of a pain to generate such with lots of arbitrary lengths and nesting. repurrrsive might have some possibilities: