checking to see if a variable has words, excluding certain words

hi!! I am interested in seeing if a variable has words in it, besides "cats", "dogs", "cat", and "dog".. by making a new column that tests this called "check". below is what I want my output to be. thank u !! <3

number           check
8 cats 9 dogs    FALSE
8 cats 11 dogs   FALSE
9 rats 0 dogs    TRUE
8 cats 1 toy     TRUE
1 cat  1 dog     FALSE

Below is one approach. I added a mix of upper case letters and punctuation to the sample data to make it more generalizable.

library(dplyr)
library(stringr)

d = data.frame(
  number = c('8 cats 9 dogs', 
             '8 Cats 11 DOgs', 
             '9 RATS, 0 dogs', 
             '8 cats - 1 toy!!', 
             '1 cat...1 dog9')
  )

# add remove words in all lowercase
vector_of_words_to_remove = c('cats', 
                              'dogs', 
                              'cat', 
                              'dog')

# transform for use in function below
words_to_remove = paste0(vector_of_words_to_remove, collapse = '|')

d %>%
  # make everything lowercase
  mutate(number = tolower(number)) %>%
  # remove words listed in vector
  mutate(check = str_replace_all(number, words_to_remove, '')) %>%
  # remove everything except remaining letters (i.e. numbers, punctuation)
  mutate(check = str_replace_all(check, '[^a-z]', ' ')) %>%
  # collapse all spaces
  mutate(check = str_squish(check)) %>%
  # assign T/F
  mutate(check = ifelse(check != '', TRUE, FALSE))
#>             number check
#> 1    8 cats 9 dogs FALSE
#> 2   8 cats 11 dogs FALSE
#> 3   9 rats, 0 dogs  TRUE
#> 4 8 cats - 1 toy!!  TRUE
#> 5   1 cat...1 dog9 FALSE

Created on 2022-09-23 with reprex v2.0.2.9000

1 Like

hi thank u so much. I just ran this code and im not entirely sure I understand what this does
mutate(check = str_replace_all(check, '[^a-z]', ' ')) %>%

It replaces everything (except letters) with a space.

d %>%
  mutate(number = tolower(number)) %>%
  mutate(check = str_replace_all(number, words_to_remove, ''))
#>             number        check
#> 1    8 cats 9 dogs        8  9 
#> 2   8 cats 11 dogs       8  11 
#> 3   9 rats, 0 dogs   9 rats, 0 
#> 4 8 cats - 1 toy!! 8  - 1 toy!!
#> 5   1 cat...1 dog9     1 ...1 9

Result when adding the following line.

 # remove everything except remaining letters (i.e. numbers, punctuation)
  mutate(check = str_replace_all(check, '[^a-z]', ' '))
#>             number        check
#> 1    8 cats 9 dogs             
#> 2   8 cats 11 dogs             
#> 3   9 rats, 0 dogs     rats    
#> 4 8 cats - 1 toy!!        toy  
#> 5   1 cat...1 dog9

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.