recursive purrr::mutate_if

purrr

#1

I have found myself dealing with long, nested, ragged lists (culled from external APIs) and would like to perform some cleaning and strong-typing on them. I'm just about to write a recursive wrapper around purrr::modify_if, but figured I'd ask here first if this is already baked in to purrr and I just haven't managed to figure it out yet.

Here's a trivial example:

x <- list(list(a = as.character(1), b = as.double(2)), c = as.character(3), d = as.double(4))
str(x)
 List of 3
  $  :List of 2
   ..$ a: chr "1"
   ..$ b: num 2
  $ c: chr "3"
  $ d: num 4

Now I'd like to recursively convert all characters into integers, sort of like so (with a make-believe .recursive=TRUE option):

modify_if(x, is.character, as.integer, .recursive = TRUE)

Does anything like this already exist?


#2

Hey @mmuurr! It certainly seems like it'd be useful, doesn't it? I'm taking a crack at this problem using modify_depth() instead (and writing the predicate test into the function instead), but it's not quite working :confused:

change_recursively = function(y) {
  message(y, ' class is ', class(y))
  if(is.character(y)) {
    as.integer(y)
  } else {
    y * 1000
  }
}

x %>%
  modify_depth(vec_depth(x), change_recursively, .ragged = TRUE) %>%
  str()
# 1 class is character
# 2 class is numeric
# 3 class is character
# 4 class is numeric
# List of 3
#  $  :List of 2
#   ..$ a: chr "1"
#   ..$ b: num 2000
#  $ c:List of 1
#   ..$ : chr "3"
#  $ d:List of 1
#   ..$ : num 4000

It seems like this is correctly classifying all of the elements, but it's only modifying the numeric ones, not the character ones :confused: (I threw the y * 1000 in there just to test whether it was that no modification was taking place at all or whether it was just a problem with the character ones.)

This is a little clunky, but if you can help me get the last 10% of the way then it might be a workable solution :slight_smile:


#3

With a recursive function, the iteration happens in the function, so there's not much point in iterating them with map or variants. They're also hard to construct without naming the function (so it can be called in itself).

(It is possible to make an anonymous recursive function with Recall, if you really want.)

Here, you need to handle three cases:

  1. The input is character and should be coerced to integer.
  2. The input is a list of length > 1 and should be recursed over.
  3. The input is length 1 but not character, and should be returned without alteration.

The control flow logic can follow that pattern directly:

library(purrr)

x <- list(list(a = as.character(1), b = as.double(2)), 
          c = as.character(3), 
          d = as.double(4))

recurse_int <- function(x){
    if (is.character(x)) {
        as.integer(x)
    } else if (length(x) > 1) {
        map(x, recurse_int)
    } else {
        x
    }
}

x %>% recurse_int() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

#4

I love the idea of iterating over leaf nodes with modify_depth, but I still have yet to get it to do what I want despite lots of tries on lots of problems. Maybe I'm doing it wrong? Here it seems like it should be

x %>% modify_depth(-1, ~if(is.character(.x)) as.integer(.x) else .x)

but that just gets me

#> Error: List not deep enough

#5

I got a lot of "Error: list not deep enough" until I added .ragged = TRUE to the call. If looks like default, FALSE, assumes a uniform depth across the tree and throws an error if it finds otherwise.

But yeah, otherwise it would seem to work! I'm not sure why I couldn't get the character elements to change, and I'm not sure if providing a negative depth to get it traverse bottom-up (instead of vec_depth) would give a different result :confused:


#6

Somebody (not me) needs to write a blog post about the intended behavior. It's the one corner of purrr that I just can't make work.


#7

If we look at how modify_depth is implemented, it is really not very fancy and basically performs modification at a fixed level. The .ragged parameter handles lists with heterogeneous levels as @rensa suggested.

If we really want to use modify_depth we have to iterate through each level:

library(purrr)

x <- list(list(a = as.character(1), b = as.double(2)), 
          c = as.character(3), 
          d = as.double(4))

as_integer_recursive <- function(x){
  if (is.character(x)) {
    as.integer(x)
  } else if (length(x) > 1) {
    map(x, as_integer_recursive)
  } else {
    x
  }
}
x %>% as_integer_recursive() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

as_integer_recursive_map <- function(x) {
  x %>%
    map_if(is.character, as.integer) %>%
    map_if(is.list, as_integer_recursive_map)
}
x %>% as_integer_recursive_map() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

as_integer_modify_depth <- function(x) {
  reduce(
    -seq(vec_depth(x) - 1),
    function(x, depth) {
      modify_depth(x, depth, ~if(is.character(.x)) as.integer(.x) else .x, .ragged = TRUE)
    }, 
    .init = x
  )
}
x %>% as_integer_modify_depth() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

microbenchmark::microbenchmark(
  as_integer_recursive(x),
  as_integer_recursive_map(x),
  as_integer_modify_depth(x),
  times = 100
)
#> Unit: microseconds
#>                         expr      min        lq       mean    median
#>      as_integer_recursive(x)   31.026   35.6635   59.97857   40.9725
#>  as_integer_recursive_map(x) 1898.626 2091.3115 3539.13028 2688.1005
#>   as_integer_modify_depth(x)  386.974  422.1990  694.83505  470.1635
#>        uq       max neval
#>    65.128   277.072   100
#>  3976.983 11519.629   100
#>   681.178  3770.032   100

Created on 2018-10-18 by the reprex package (v0.2.1)


#8

Ah, so the .depth parameter really is depth and not max depth—you have to repeat it for each level? Thanks, @saurfang!


#9

Remember to always check for a base solution:

rapply(x, f = as.integer, classes = "character", how = "replace")

If the condition is more complex than just a class, include it in the function:

as_integer_safe <- function(y) {
  if (is.character(y) && all(grepl("^\\d+$", y))) {
    as.integer(y)
  } else {
    y
  }
}

x[[1]][[3]] <- "dog"
rapply(x, f = as_integer_safe, how = "replace")
# [[1]]
# [[1]]$`a`
# [1] 1
# 
# [[1]]$b
# [1] 2
# 
# [[1]][[3]]
# [1] "dog"
# 
# 
# $c
# [1] 3
# 
# $d
# [1] 4

#10

Nice! TIL rapply definitely cleaner and faster. I have updated benchmark.

library(purrr)

x <- list(list(a = as.character(1), b = as.double(2)), 
          c = as.character(3), 
          d = as.double(4))

as_integer_rapply <- function(x) {
  rapply(
    x,
    as.integer,
    "character",
    how = "replace"
  )
}
x %>% as_integer_rapply() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

as_integer_recursive <- function(x){
  if (is.character(x)) {
    as.integer(x)
  } else if (length(x) > 1) {
    map(x, as_integer_recursive)
  } else {
    x
  }
}
x %>% as_integer_recursive() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

as_integer_recursive_map <- function(x) {
  x %>%
    map_if(is.character, as.integer) %>%
    map_if(is.list, as_integer_recursive_map)
}
x %>% as_integer_recursive_map() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4

as_integer_modify_depth <- function(x) {
  reduce(
    -seq(vec_depth(x) - 1),
    function(x, depth) {
      modify_depth(x, depth, ~if(is.character(.x)) as.integer(.x) else .x, .ragged = TRUE)
    }, 
    .init = x
  )
}
x %>% as_integer_modify_depth() %>% str()
#> List of 3
#>  $  :List of 2
#>   ..$ a: int 1
#>   ..$ b: num 2
#>  $ c: int 3
#>  $ d: num 4


microbenchmark::microbenchmark(
  as_integer_recursive(x),
  as_integer_recursive_map(x),
  as_integer_modify_depth(x),
  as_integer_rapply(x),
  times = 100
)
#> Unit: microseconds
#>                         expr      min       lq       mean    median
#>      as_integer_recursive(x)   30.422   34.113   40.11331   37.4595
#>  as_integer_recursive_map(x) 1991.935 2085.603 2383.23113 2130.0030
#>   as_integer_modify_depth(x)  395.035  425.333  549.62640  453.5695
#>         as_integer_rapply(x)    9.824   16.629   29.73318   19.3345
#>         uq      max neval
#>    42.3080   88.839   100
#>  2258.3110 4411.591   100
#>   539.1720 4035.030   100
#>    21.2675 1072.674   100

Created on 2018-10-19 by the reprex package (v0.2.1)


#11

Calling the functions on such a small list, you're mostly capturing function overhead with the benchmark. If speed matters, the benchmark would have to be on a much larger list. The problem is that it's sort of a pain to generate such with lots of arbitrary lengths and nesting. repurrrsive might have some possibilities:


#12

Ah, I didn't know rapply existed, this is definitely the preferred solution, thanks!

I wonder if it would still be beneficial to have such recursive functionality added to a few purrr functions?