Filter a list column

dplyr
purrr

#1

I'm trying to filter a data frame on a list column. Is such a thing even possible? The reason for this is that I have a function that I call with purrr::map that returns NULL if there are no data and a data frame otherwise. I'd then like to remove the rows that return NULL. However, calling filter with !is.null() on that column has no effect.

The simple reprex below illustrates what I am trying to do:

suppressPackageStartupMessages(library(tidyverse))

f <- function(x) {
  if (x < 0) {
    return(NULL)
  } else {
    return(
      tibble(
        x = rnorm(10),
        y = letters[1:10],
        z = runif(10)
      )
    )
  }
}

df <- tibble(
  a = LETTERS[1:10],
  b = rnorm(10)
)

df_out <- df %>% 
  mutate(
    c = map(b, f)
  )

df_out %>% filter(!is.null(c))
#> # A tibble: 10 x 3
#>    a          b c                
#>    <chr>  <dbl> <list>           
#>  1 A     -1.34  <NULL>           
#>  2 B      0.292 <tibble [10 × 3]>
#>  3 C      1.22  <tibble [10 × 3]>
#>  4 D      1.63  <tibble [10 × 3]>
#>  5 E     -1.32  <NULL>           
#>  6 F     -0.933 <NULL>           
#>  7 G     -1.53  <NULL>           
#>  8 H      0.449 <tibble [10 × 3]>
#>  9 I     -1.65  <NULL>           
#> 10 J     -0.365 <NULL>

Note that I cannot filter on column b as in the real world things aren't this simple!

I've thought that maybe I can use purrr::transpose() in combination with purrr::compact in some way, but I can't really get my head around it, so any help would be greatly appreciated :slightly_smiling_face:


#2

I may have answered my own question - by creating a new column and filtering on that:

suppressPackageStartupMessages(library(tidyverse))

f <- function(x) {
  if (x < 0) {
    return(NULL)
  } else {
    return(
      tibble(
        x = rnorm(10),
        y = letters[1:10],
        z = runif(10)
      )
    )
  }
}

df <- tibble(
  a = LETTERS[1:10],
  b = rnorm(10)
)

df_out <- df %>% 
  mutate(
    c = map(b, f)
  )
df_out %>% 
  mutate(filter_col = map_lgl(c, ~ !is.null(.x))) %>% 
  filter(filter_col) %>% 
  select(-filter_col)
#> # A tibble: 4 x 3
#>   a         b c                
#>   <chr> <dbl> <list>           
#> 1 A     0.162 <tibble [10 × 3]>
#> 2 B     0.630 <tibble [10 × 3]>
#> 3 D     1.06  <tibble [10 × 3]>
#> 4 F     0.558 <tibble [10 × 3]>

Created on 2018-05-23 by the reprex package (v0.2.0).

Though maybe there's a more elegant solution?


#3

Adding the column is optional:

df_out %>% filter(map_lgl(c, ~ !is.null(.)))