How to apply filter of multiple conditions to multiple variables and see resulting list of values?

I have a data set (200,000 x 200) from which I use a str_detect filter to subset based on multiple criteria and multiple variables. How can I check my work to see what values exist in the filtered data? My goal is to do this without copying and pasting for each variable.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2

# criteria to include in list
target <- c("f11", "t40") %>% # either/or
  paste(collapse = "|")

target
#> [1] "f11|t40"

# data
my_data <- tribble(
  ~p_dx, ~dx_1, ~dx_2, ~dx_3,
  "f11", "t401", NA, NA,
  "f11", "t402", "f12", "t41",
  "f01", "t01", "f111", "t401",
  "f02", "t402", NA, NA,
  "t40", "f111", NA, NA
)

my_data
#> # A tibble: 5 x 4
#>   p_dx  dx_1  dx_2  dx_3 
#>   <chr> <chr> <chr> <chr>
#> 1 f11   t401  <NA>  <NA> 
#> 2 f11   t402  f12   t41  
#> 3 f01   t01   f111  t401 
#> 4 f02   t402  <NA>  <NA> 
#> 5 t40   f111  <NA>  <NA>

# if I want to see which target value is present one variable at a time
p_dx_list <- my_data %>%
  filter(str_detect(p_dx, target)) %>%
  distinct(p_dx)

p_dx_list
#> # A tibble: 2 x 1
#>   p_dx 
#>   <chr>
#> 1 f11  
#> 2 t40

dx_1_list <- my_data %>%
  filter(str_detect(dx_1, target)) %>%
  distinct(dx_1)

dx_2_list <- my_data %>%
  filter(str_detect(dx_2, target)) %>%
  distinct(dx_2)

# and so on
# then bind the rows together 
my_list <- bind_rows(p_dx_list,
                     dx_1_list,
                     dx_2_list) %>%
  pivot_longer(everything()) %>% # make long
  drop_na(value) %>%
  distinct(value) %>%
  arrange(value)

# I want to see (as a list) what my str_detect filtered
my_list
#> # A tibble: 5 x 1
#>   value
#>   <chr>
#> 1 f11  
#> 2 f111 
#> 3 t40  
#> 4 t401 
#> 5 t402

Created on 2021-12-29 by the reprex package (v2.0.1)

Would this work for you?

library(purrr)
library(tibble)
target <- c("f11", "t40") %>% # either/or
  paste(collapse = "|")

target
#> [1] "f11|t40"

# data
my_data <- tribble(
  ~p_dx, ~dx_1, ~dx_2, ~dx_3,
  "f11", "t401", NA, NA,
  "f11", "t402", "f12", "t41",
  "f01", "t01", "f111", "t401",
  "f02", "t402", NA, NA,
  "t40", "f111", NA, NA
)
LIST <- map(my_data,.f=function(COL){
  tmp <- unique(COL)
  tmp[grepl(target,tmp)]
})
unique(unlist(LIST))
#> [1] "f11"  "t40"  "t401" "t402" "f111"

Created on 2021-12-29 by the reprex package (v2.0.1)

1 Like

Yes. This works. Thank you for your help.

It looks like map applies the custom function {filter, unique} to each variable and then combines the results together into a single list. I'm not clear on how.

Yes, map() iterates along a list, applying the given function to each element of the list and it returns a list. Since a data frame is a basically a list, map() can be used to iterate over all of the columns of a data frame.

That's helpful. Thanks.

Try this:

map(my_data, ~unique(.x[which(str_detect(.x, target))]))
1 Like

Thank you. Yes. It does work (see below). However, I prefer the output from this solution.

#$admitting_diagnosis
#[1] "F1120"   "T401X1A" "T404X2A" "T402X1A" "F1123"   "T402X2A"
#
#$principal_diagnosis
# [1] "T404X1A" "T401X1A" "F1123"   "T402X2A" "F1190"   "T40601A" "T402X1A" "F1120"   "T402X4A" "T403X4A"
#[11] "T404X2A" "T40411A" "F1193"   "T400X1A"
#
#$diagnosis_2
# [1] "F1120"   "T402X2A" "T40605A" "T402X5A" "F1121"   "T402X1A" "F1123"   "T404X5A" "T404X1A" "T40601A"
#[11] "F11921"  "F1190"   "T402X6A" "F1199"   "T402X4A" "F11229" 
#
#$diagnosis_3
#[1] "F1120"   "F1123"   "T402X5A" "T404X2A" "T402X2A" "F1121"   "F1190"   "F1199"   "T40601A"
# and so on...

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.