creating a vector of words in comments but excluding words from another vector

AngryGeologist · April 10, 2019, 7:19pm

Hi there,
Im a relative Noob to R, Ive just started Post Grad data science, and this is my first exposure to R

I am working with comments from a survey, and want to get a feel for word frequency.
I have a vector comment which contains all the words from the comments, but I want to exclude words like the, a, is etc from the analysis.

I have a vector text
text=c('cat', 'dog', 'mouse', 'the', 'and')
I have another vector exclude
exclude=c('the', 'and')

and want to create the vector text1
('cat', 'dog', 'mouse')

Ive tried subset, and a couple of other solutions Ive found online. I know its going to be a simple answer. TIA

mishabalyasin · April 10, 2019, 7:24pm

There is this free book that might help you with this task and possible with other tasks as well:

nwerth · April 10, 2019, 7:34pm

Like mishabalyasin, I'm a fan of "teach a man to fish." Still, for your specific question, use setdiff():

text <- c('cat', 'dog', 'mouse', 'the', 'and')
exclude <- c('the', 'and')
setdiff(text, exclude)
# [1] "cat"   "dog"   "mouse"

AngryGeologist · April 10, 2019, 10:08pm

Awesome.

Thanks for the recommendation on the book.
and setdiff() did exactly what I need

system · May 1, 2019, 10:13pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.