Detect strings by specifying strings and by postion

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.1.1
df <- tibble(x = c("apple", "deal", "panel"))
df
#> # A tibble: 3 x 1
#>   x    
#>   <chr>
#> 1 apple
#> 2 deal 
#> 3 panel

How can I do the following:

  • keep rows that contain "apl" i.e. apple and panel

  • keep rows that have "a" in the third position i.e. deal

HI,

You can use some RegEx to solve this:

library(tidyverse)

df <- tibble(x = c("apple", "deal", "panel", "plum", "ale"))
df

df %>% filter(str_detect(x, "^..a")| 
                (str_detect(x, "a") & str_detect(x, "p") & str_detect(x, "l")))
# A tibble: 3 x 1
  x    
  <chr>
1 apple
2 deal 
3 panel

There might be a way in RegEx to detect the "apl"in any order (but each at least once), but I can't think of one at the moment, so for now this one works where I test each one separately.

Hope this helps,
PJ

2 Likes

@pieterjanvc Many thanks for the solution! I find it very helpful.

It would be nice to know how we can use RegEx to detect the "apl" in any order.

1 Like
# add word that should not match  
DF <- tibble::tibble(x = c("apple", "deal", "panel","reel"))  
DF
#> # A tibble: 4 × 1
#>   x    
#>   <chr>
#> 1 apple
#> 2 deal 
#> 3 panel
#> 4 reel

find_patterns <- function(x) {  
  has_a = function(x) grep("a",x, value = TRUE) # "a" anywhere  
  has_l = function(x) grep("l",x, value = TRUE) # "l" anywhere  
  has_p = function(x) grep("p",x, value = TRUE) # "p" anywhere  
  has_a3 = function(x) grep("^..a",x, value = TRUE) # "a" third position  
  c(has_a3(x),c(has_a(x) |> has_l() |> has_p()))  
}

find_patterns(DF$x)
#> [1] "deal"  "apple" "panel"
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.