Filtering based on specific strings

So I am trying to filter a data set by rows that have don't have a specific string. I want to filter against rows that contain the string "biological process", but not ones that have something like "negative regulation of biological process". When I use the following code it gets rid of both rows because they both contain that phrase. Is there a way to do an exact match to do this?

utlarge.go.filter<-utlarge.go %>%
  filter(Count>=10&!str_detect(GO_assignment,"biological process"))
(d <- data.frame(
  upper = LETTERS,
  lower = letters)) |> head()
#>   upper lower
#> 1     A     a
#> 2     B     b
#> 3     C     c
#> 4     D     d
#> 5     E     e
#> 6     F     f

d[which(d[1] == "A"),]
#>   upper lower
#> 1     A     a
d[which(d[1] != "A"),] |> head()
#>   upper lower
#> 2     B     b
#> 3     C     c
#> 4     D     d
#> 5     E     e
#> 6     F     f
#> 7     G     g

Created on 2023-05-17 with reprex v2.0.2

To filter a data set and exclude rows that contain the string "biological process" but not those with phrases like "negative regulation of biological process," you can use the grepl function with word boundaries. Here's an example code snippet that achieves this:

utlarge.go.filter <- utlarge.go %>% filter(Count >= 10 & !grepl("\bbiological process\b", GO_assignment))

The \\b represents word boundaries in regular expressions, ensuring an exact match for the phrase "biological process" without including partial matches or variations. By using grepl instead of str_detect, you achieve the desired result of filtering out only the exact match rows you specified. Also with simliar process i made rows on my gaming website!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.