Recode values of a list column

Dear all,

I am trying to recode (using dplyr's recode) all the values that are NULL to "NA". The reason for that is that I would like to report missing values and those that are NULL cannot be reported as such so that's why I decided to do this transformation. However am getting the following error:

Error in UseMethod("recode") :
no applicable method for 'recode' applied to an object of class "list"

I thought of unnesting this column and then apply recoding but this is not possible either:
Error: Each column must either be a list of vectors or a list of data frames

So the question would be: how would you recode values in list columns?

Many thanks,
Dimitris

2 Likes

You need to reference the objects within the list instead of the list, itself. For example, if you are using a list of data frames, you have to reference the columns within each data.frame contained in the list. Hadley provides an excellent explanation of the relationship(s) between lists and their contents in R4DS.

As @jbono said, if you want to inspect the individual list values, you'll need to inspect/recode the elements. If you decide to break them out into separate columns, TJ Mahr's set_na_where() post might be helpful.

2 Likes

Fwiw, in base R you could test for replacement with lengths:

> L = list(1, 2)[1:3]

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
NULL

> replace(L, lengths(L)==0, list(NA_real_))

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] NA

There's probably a "right" choice for the type of NA in your case, but I just put in real as an example.

The replace function has the correct structure for piping (in the sense that its first argument is the object you're modifying) so there's also..

library(dplyr)
DF = data_frame(id = 1:3, L = L)
DF %>% mutate(L_nonull = L %>% replace(!lengths(.), list(NA_real_)))

# A tibble: 3 x 3
     id         L  L_nonull
  <int>    <list>    <list>
1     1 <dbl [1]> <dbl [1]>
2     2 <dbl [1]> <dbl [1]>
3     3    <NULL> <dbl [1]>

(!lengths(.) is the same as lengths(.) == 0 thanks to coercion rules.)

Personally, I always recode with a join, but there's no support for list-column joins:

mDF = data_frame(old = list(list(NULL)), new = list(list(NA_real_)))
right_join(DF, mDF, by = c(L = "old"))
# Error in right_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : 
#   Can't join on 'L' x 'old' because of incompatible types (list / list)
3 Likes

Can you provide a reprex with sample data ? It would be easier to answer.

I understood that you have a list column with some element of the list that have some NULL element you want to replace by NA. It is the example I took to show how to use purrr function map to manipulate list column with dplyr.
It just another solution showing the use of purrr

library(dplyr, warn.conflicts = F)
library(purrr)
tab <- data_frame(Group =c(1, 2), list_col =list(list(NULL, 3, 2), list(5, 4, NULL)))
tab %>% glimpse()
#> Observations: 2
#> Variables: 2
#> $ Group    <dbl> 1, 2
#> $ list_col <list> [[NULL, 3, 2], [5, 4, NULL]]
tab %>% 
  mutate(list_col_without_na = map(tab$list_col, ~ map_dbl(.x, ~ if_else(is_null(.x), NA_real_, .x)))) %>%
  glimpse()
#> Observations: 2
#> Variables: 3
#> $ Group               <dbl> 1, 2
#> $ list_col            <list> [[NULL, 3, 2], [5, 4, NULL]]
#> $ list_col_without_na <list> [<NA, 3, 2>, <5, 4, NA>]

Depending on your data and use case, you have to adapt it but the principle is here. map allows you to iterate through each element of a list column to apply some function. (here another map to iterate, find NULL value and replace by NA.

If you want to do other thing on a list column that replace something, it is a mechanism good to know.

1 Like

Here's two options.

library(tidyverse)

x <- tibble(
  a = letters[1:3],
  b = list(1L, NULL, 3.01)
)
x %>% 
  mutate(b = map_dbl(b, ~ .x %||% NA))
#> # A tibble: 3 x 2
#>       a     b
#>   <chr> <dbl>
#> 1     a  1.00
#> 2     b    NA
#> 3     c  3.01
x %>%
  mutate(b = map_dbl(b, 1, .default = NA))
#> # A tibble: 3 x 2
#>       a     b
#>   <chr> <dbl>
#> 1     a  1.00
#> 2     b    NA
#> 3     c  3.01

Where did the NULLs come from in the first place? If it was from extracting list elements by name or position, the .default argument can be used to replace them with something (such as NA) at that very moment (my example shows this once the NULLs are already in a list-column, but they can often be eliminated much earlier in a workflow).

Otherwise, the %||% operator from rlang and re-exported by purrr is really nice for NULL handling in general.

5 Likes

rlang :heart_eyes::heart_eyes::heart_eyes:

Many thanks Jenny for your elegant solutions. In my case I tried them and they both give me the following error:

Error in mutate_impl(.data, dots) :
Evaluation error: Can't coerce element 1 from a character to a double.

Any suggestions?

1 Like

It sounds like you're providing character data to map_dbl(), which expects floating point numbers. Perhaps map_chr() is more appropriate in your case.

I had to make up an example in order to be concrete and it's obviously a bit different from your actual problem. It is much easier to help if you provide a clean reprex.

2 Likes

I always find myself wishing their was a non-infix version of %||%, though I guess backticks suffice:

library(tidyverse)

x <- tibble(
  a = letters[1:3],
  b = list(1L, NULL, 3.01)
)

x %>% mutate(b = map_dbl(b, `%||%`, NA))
#> # A tibble: 3 x 2
#>       a     b
#>   <chr> <dbl>
#> 1     a  1.00
#> 2     b    NA
#> 3     c  3.01

More generally, a version of rapply that converts every NULL or length-0 vector in a list to NA would be useful; I've written that code way more than three times. I'd write a generalized wrapper for rapply or modify_depth with depth = -1, but rapply seems to skip NULLs, and modify_depth is finicky about shallow lists.

1 Like

Many thanks for all your helpful suggestions - greatly appreciated :slight_smile:

1 Like