Use grep to find for if multiple words appear at the same time in array Entry

stringr

#1

I'm using R studio to find pattern in array containing phrases. I'm trying to select all the entry containing some words at any position but my code is not working. Can any body help me please

this is my code

Text <- c("instance", "percentage", "n", "instance tes percentage toto", "percentage gff instance")
t<- 'toto'
d<-'instance'
r<-'percentage'
part = paste(t,d, r, sep=".*")
grep(part, Text)```

#2

You did not precise what result you are expecting. I guess you want all element in Text except n which does not have any of toto, percentage or instance. For that in the regex you need to use an OR clause, i.e |

You can do that with str_subset in stringr for example.

Text <- c("instance", "percentage", "n", "instance tes percentage toto", "percentage gff instance")
stringr::str_subset(Text, "instance|percentage|toto")
#> [1] "instance"                     "percentage"                  
#> [3] "instance tes percentage toto" "percentage gff instance"

Created on 2018-07-24 by the reprex package (v0.2.0).

If you want to use grep and friends, just adapt and reword you regex. This kind of website can be of help: https://regexr.com/, https://regex101.com/.

Also, this :package: can be of big help:


#3

I just want the entry [4] as result > instance tes percentage toto


#4

ok. Next time, do not hesitate to give an example of what you want, even built manually.

You just have to change the order of word in your regex in your example:

Text <- c("instance", "percentage", "n", "instance tes percentage toto", "percentage gff instance")
stringr::str_subset(Text, "instance.*percentage.*toto")
#> [1] "instance tes percentage toto"

Created on 2018-07-24 by the reprex package (v0.2.0).

But are the words always in the same order ?


#5

Thank you for the participation but it is not solving my problem.
What I want to do is to find independently of the position of the word in the phrase. with the array below for example I want to have as result the entry [4] and [6]

Text <- c("instance", "percentage", "n", "instance tes percentage toto", "percentage gff instance"," percentage tet toto tet instance ")

#6

OK. It is why I thought but you did not give this info in your question. It is not efficient if we have to "guess" your problem.

For this case, you need to adapt again the regex to your need. Using positive lookahead (?=) you can achieve this.

Text <- c("instance", "percentage", "n", "instance tes percentage toto", "percentage gff instance"," percentage tet toto tet instance ")
stringr::str_subset(Text, "(?=.*instance)(?=.*percentage)(?=.*toto)")
#> [1] "instance tes percentage toto"      
#> [2] " percentage tet toto tet instance "

Created on 2018-07-24 by the reprex package (v0.2.0).

We are solving your problem step by step with the information you give. Quick reminder on how to ask better question: FAQ: Tips for writing R-related questions to get efficient answer ! :wink:

Thanks for your availability to adjust you question though.