Filtering a list of vectors with substrings

First of all, I'm pretty new to tidyverse programming, so sorry if I'm missing the obvious. However, I tried playing around with map and grep and couldn't get to where I wanted.

I have a list of vectors (containing filepaths) which I would like to filter by matching a substring.

My list filePaths looks like this:

$fb
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 fb.txt"           
 [2] "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 fb.txt"           
 [3] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 fb.txt"           
 [4] "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 fb.txt"           
 [5] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 fb.txt"           

$sl
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt"
 [2] "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 sb olp.txt"
 [3] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt"
 [4] "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 sb olp.txt"
 [5] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"

$sc
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb csv.txt"
 [2] "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 sb csv.txt"
 [3] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb csv.txt"
 [4] "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 sb csv.txt"
 [5] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb csv.txt"

I also have vectors which contain substrings in order to filter these lists of vectors:

g_a = c("EFFGGHHII", "SSTTUUVVW", "BBCCDDEEF")
g_b = c("JJKKLLMMN", "223344556")

What would be a tidyverse way of applying the two vectors in a way that returns a list of vectors with only the matched elements inside?

I came as far as this:
filePaths %>% map(~grep(paste(g_a,collapse="|"), .x))
which however only produced this result:

$fb
[1]  1  3  5

$sl
[1]  1  3  5

$sc
[1]  1  3  5

The result I want would look like this:

$fb
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 fb.txt"           
 [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 fb.txt"           
 [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 fb.txt"           

$sl
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt"
 [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt"
 [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"

$sc
 [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb csv.txt"
 [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb csv.txt"
 [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb csv.txt"

Here's how to do it with stringr functions.

library(tidyverse)

filePaths <- list(fb = c("U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 fb.txt", 
                         "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 fb.txt", 
                         "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 fb.txt", 
                         "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 fb.txt", 
                         "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 fb.txt"),
                  s1 = c("U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt", 
                         "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 sb olp.txt", 
                         "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt", 
                         "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 sb olp.txt", 
                         "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"),
                  sc = c("U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt", 
                         "U:/xxx/yyy/JJKKLLMMN NOOPPQQRR/JJKKLLMMN NOOPPQQRR @ 2019-09-09 sb olp.txt", 
                         "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt", 
                         "U:/xxx/yyy/223344556 6778899AA/223344556 6778899AA @ 2019-09-10 sb olp.txt", 
                         "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"))

g_a <- c("EFFGGHHII", "SSTTUUVVW", "BBCCDDEEF")

map(filePaths, ~ str_subset(.x, str_c(g_a, collapse = "|")))
#> $fb
#> [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 fb.txt"
#> [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 fb.txt"
#> [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 fb.txt"
#> 
#> $s1
#> [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt"
#> [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt"
#> [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"
#> 
#> $sc
#> [1] "U:/xxx/yyy/EFFGGHHII AABBCCDDE/EFFGGHHII AABBCCDDE @ 2019-09-09 sb olp.txt"
#> [2] "U:/xxx/yyy/SSTTUUVVW WXXYYZZ11/SSTTUUVVW WXXYYZZ11 @ 2019-09-08 sb olp.txt"
#> [3] "U:/xxx/yyy/BBCCDDEEF FGGHHIIJJ/BBCCDDEEF FGGHHIIJJ @ 2019-09-10 sb olp.txt"

Created on 2020-05-18 by the reprex package (v0.3.0)

1 Like

siddharthprabhu has an elegant solution whic his probably the way to go, but here is how your approach would be 'completed'
you used map and grep to determine the indexes,
so save these and map over them with map2

indexes <- filePaths %>% map(~grep(paste(g_a,collapse="|"), .x))
results <- map2(.x =filePaths,.y=indexes,
                ~.x[.y])
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.