Filter sentences with two words

Hello everyone

I have a large data frame with information on species (location, registration date, kingdom, class, order, gender, etc..). The column that interests me is the "scientific name".

I would like to be able to filter through this column those values that have two words and thus, get rid of the data that only have one word in this column (that would mean that I do not have the complete information about its scientific name)

I appreciate if you can help me with this. Thank you very much and greetings

Hi. So the words are separated by spaces right?
We can use str_count from stringr package to count the words, then select only the columns that have the count greater than 1.
For example:

myFilteredDF = myDF %>% filter(str_count( `scientific name` , "\\S+") >1 )

Without using dplyr:

myFilteredDF = myDF[ str_count( myDF$`scientific name` , "\\S+") >1 ,]

How well a classification works depending on length two vs. length one depends on the data.

It will work well enough with the pair Melanogrammus aeglefinus and Haddock, but not well with Melissa melissa samuelis and Karner Blue.

Can you share data sample, and the expected output?

Hey Hicham

Actually your answer was all I needed to achieve it, it worked perfect

Thanks a lot

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.