Filtering a list in a data frame

[minor edits for grammar and readability]

I'm trying to filter a data set that has a list as one of the variables. Is there a way to do this elegantly with the tidyverse, if no, what's a reasonable way of doing this?

library(dplyr)

color_df <- tibble(
      color = c("red", "orange", "yellow", "green", 
                "blue", "indigo", "violet")
      )

base_colors <- list("red", c("red", "yellow"), "yellow", 
                    c("yellow", "blue"), "blue", "blue",
                    c("blue", "red")
                    )

color_df$color_base <-  base_colors

color_df %>% filter("red" %in% color_base)

Ideally, the last line would return a df with the 3 rows where color was red, orange, and purple, instead of just returning the whole data frame (I'm assuming this is because "red" is in the color_base variable).

Is there any way of doing this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
color_df <- tibble(color = c("red", "orange", "yellow", "green", "blue", "indigo", "violet"))
base_colors <- list("red", c("red", "yellow"), "yellow", c("yellow", "blue"), "blue", "blue", c("blue", "red"))

color_df$color_base <-  base_colors

FindRed <- function(x) {
  "red" %in% x
}
color_df <- color_df %>% group_by(color) %>% 
  mutate(Flag = map_lgl(color_base, FindRed)) %>% 
  filter(Flag) %>% 
  select(-Flag)
color_df
#> # A tibble: 3 x 2
#> # Groups:   color [3]
#>   color  color_base
#>   <chr>  <list>    
#> 1 red    <chr [1]> 
#> 2 orange <chr [2]> 
#> 3 violet <chr [2]>

Created on 2020-05-05 by the reprex package (v0.3.0)

1 Like

In our example, wasn't each row unique to begin with?

Running the code without group_by(color) doesn't seem to change the results, and if there were two rows with the same color name, but different 'base colors" couldn't this lead to unpredictable results?

1 Like

Well, I was wrong. I thought I checked that the group_by was necessary before I first posted the solution but I must have made some other mistake. Thanks for pointing that out.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Do you mean like this or do you want the entire process wrapped in a function?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
color_df <- tibble(color = c("red", "orange", "yellow", "green", "blue", "indigo", "violet"))
base_colors <- list("red", c("red", "yellow"), "yellow", c("yellow", "blue"), "blue", "blue", c("blue", "red"))

color_df$color_base <-  base_colors

FindRed <- function(x, colorToFind) {
  colorToFind %in% x
}
color_df <- color_df %>% group_by(color) %>% 
  mutate(Flag = map_lgl(color_base, FindRed, "red")) %>% 
  filter(Flag) %>% 
  select(-Flag)
color_df
#> # A tibble: 3 x 2
#> # Groups:   color [3]
#>   color  color_base
#>   <chr>  <list>    
#> 1 red    <chr [1]> 
#> 2 orange <chr [2]> 
#> 3 violet <chr [2]>

Created on 2020-05-05 by the reprex package (v0.3.0)

1 Like

This is great for a one-off solution, but if I wanted to find all the elements with yellow, I'd need to make another function.

Is there a way to make this work in a more generalized manner?
Ideally one where I could use the function below, and then specify many 'targets', so that i could reuse the function and not have to create a new function every time.

has_list_element <- function(target, list_element) {
      target %in% list_element
}

Although maybe i should just be using sapply at this point anyways.

One question though-- what's the point of doing a group_by(color) here?

yes, perfect! thanks a million!

Without the group_by, mutate operates on the entire data frame. With the group_by() it operates on each row because color is unique for each row.