How can I quickly debug purrr::map_dfr to identify a problematic column?

elikesprogramming · May 24, 2021, 4:50pm

purrr::map_dfr is super cool. It let’s you get a data frame created by
row-binding the output of .fn, and the argument .id let’s you
identify the input associated to each row

purrr::map_dfr(mtcars, quantile, .id = "varname")
#> # A tibble: 11 x 6
#>    varname  `0%`  `25%`  `50%`  `75%` `100%`
#>    <chr>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#>  1 mpg     10.4   15.4   19.2   22.8   33.9 
#>  2 cyl      4      4      6      8      8   
#>  3 disp    71.1  121.   196.   326    472   
#>  4 hp      52     96.5  123    180    335   
#>  5 drat     2.76   3.08   3.70   3.92   4.93
#>  6 wt       1.51   2.58   3.32   3.61   5.42
#>  7 qsec    14.5   16.9   17.7   18.9   22.9 
#>  8 vs       0      0      0      1      1   
#>  9 am       0      0      0      1      1   
#> 10 gear     3      3      4      4      5   
#> 11 carb     1      2      2      4      8

Cool!, but when it fails, it is not straightforward to identify the
culprit

purrr::map_dfr(ggplot2::diamonds, quantile, .id = "varname")
#> Error in quantile.default(.x[[i]], ...): 'type' must be 1 or 3 for ordered factors

It propagates the error thrown by .fn without any hint about the
offending column

traceback()
#> 5: stop("'type' must be 1 or 3 for ordered factors")
#> 4: quantile.default(.x[[i]], ...)
#> 3: .f(.x[[i]], ...)
#> 2: map(.x, .f, ...)
#> 1: purrr::map_dfr(ggplot2::diamonds, quantile, .id = "varname")

purrr has also safely, quietly, possibly which could help,
but that’s not really debugging but changing your approach, which
also requires different output processing down the line.

purrr::auto_browse or a good-old print statement within the
.fn does not help either, because it does not receive the name of
the column (you can try to infer the offending column from its content
but that’s not really quick-debugging)

purrr::map_dfr(ggplot2::diamonds, purrr::auto_browse(quantile),
               .id = "varname")

So, in the end a good-old (bad-old?) for loop + print statement is a quick
solution, but certainly it would be considered bad practice. And also
it entails paraphrasing the much more elegant and concise
purrr-approach

for (i in seq_along(ggplot2::diamonds)) {
  print(names(ggplot2::diamonds[i]))
  quantile(ggplot2::diamonds[[i]])
}
#> [1] "carat"
#> [1] "cut"
#> Error in quantile.default(ggplot2::diamonds[[i]]): 'type' must be 1 or 3 for ordered factors

So what is the canonical way to debug this case?

^{Created on 2021-05-24 by the reprex package (v2.0.0)}

jimhester · May 24, 2021, 5:20pm

This is a good use for one of the more advanced debugging techniques in R, the recovery console.

You can enable this by setting options(error = recover) before you run code that throws an error.

As you observed often the data you need is not available at the point an error occurs, you then need to look further up the stack for the information.

Luckily this is what the recovery console allows you to do. When the error occurs a dialog is displayed allowing you to select which stack frame you want to inspect, and you can then print any variables in the stack frame.

For this case the second frame holds the data we need.

Using it looks something like this,

options(error = recover)
> purrr::map_dfr(ggplot2::diamonds, quantile, .id = "varname")
Error in quantile.default(.x[[i]], ...) :
  'type' must be 1 or 3 for ordered factors
Enter a frame number, or 0 to exit
1: purrr::map_dfr(ggplot2::diamonds, quantile, .id = "varname")
2: map.R#235: map(.x, .f, ...)
3: map.R#111: .f(.x[[i]], ...)
4: quantile.default(.x[[i]], ...)
Selection: 2
Called from: .f(.x[[i]], ...)
Browse[1]> ls()
[1] "i"
Browse[1]> i
[1] 2
Browse[1]>

elikesprogramming · May 25, 2021, 4:36pm

Great!, many thanks!

system · June 1, 2021, 4:37pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.