filter rows of a data frame by vector of strings

I want to filter the rows of a data frame that match my vector of strings. Somehow i dont get the logic behind this:

fruits <- tibble(fruit=c("apple", "apple", "banana", "orange", "kiwi")) 
# A tibble: 5 x 1
  fruit 
  <chr> 
1 apple 
2 apple 
3 banana
4 orange
5 kiwi  

filter by vector c("a", "w"), leads to the following output. why are kiwi and orange not filtered, why is apple only filtered once?

fruits %>%
filter(str_detect(fruit, c("a", "w")))
 fruit 
# A tibble: 2 x 1
  fruit 
  <chr> 
1 apple 
2 banana

whereas filter by vector c("a", "e"), leads to the following output. Why is apple filtered twice here?

fruits %>%
filter(str_detect(fruit, c("a", "e")))
# A tibble: 4 x 1
  fruit 
  <chr> 
1 apple 
2 apple 
3 banana
4 orange

str_detect is looking for a single regular expression as the second argument. If you want to filter for words that contain a or w, use

fruits %>%
  filter(str_detect(fruit, "a|w"))

if you use c("a", "w") instead, it seems the first word is checked for a, the second word is checked for w, the third word is checked for a, and so on. This explains the results you got

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.