R:Text Mining convert vector of sentences to vector of individual words

Hi all

This is driving me mad - can anyone help.

I have a vector of data that looks like this
"This is some text"
"This is something different"
"Fish fingers"
"And now for something completely different".

I want to make it a vector of one word only so I end up with
This
is
some
text
This
is
something
different
...

I'll tidy up the duplicates later. I've tried all sorts of approaches in R but I can't get it down to one row per word.

FWIW after this I will be removing duplicates and doing a frequency count for each unique word.

Thx in advance and sorry if this is covered elsewhere - I have searched but couldn't fine anything

Hello,

Here is a way of accomplishing that

library(dplyr)

data = c("I have a vector of data that looks like this", 
  "This is some text", 
  "This is something different", 
  "Fish fingers", 
  "And now for something completely different")

data = sapply(data, function(x) strsplit(x, split = " ")) %>%  
         unlist() %>% tolower() %>% unique()

data
#>  [1] "i"          "have"       "a"          "vector"     "of"        
#>  [6] "data"       "that"       "looks"      "like"       "this"      
#> [11] "is"         "some"       "text"       "something"  "different" 
#> [16] "fish"       "fingers"    "and"        "now"        "for"       
#> [21] "completely"

Created on 2020-08-11 by the reprex package (v0.3.0)

I added the unique and to lower to remove duplicates, but of course you can change that if needed.

Hope this helps,
PJ

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.