Hi,
I have gone through texts like this one: https://www.tidytextmining.com/tfidf.html#term-frequency-in-jane-austens-novels
but I simply need to find a way of listing all English words mentioned in one String variable with exclusions of specified words (like "the") and exclusions of words shorter than 3 characters.
Let's use this simple sample:
data.frame(stringsAsFactors=FALSE,
URN = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii"),
E1 = c(1, 2, 3, NA, NA, NA, NA, NA, NA),
string = c("book", "my book", "example", "examples", "nothing",
"the end", "a", "v", "bg"),
A2 = c(10, 2, 3, 4, 5, 6, 7, 8, 9),
B1 = c(3, 9, 10, 1, 2, NA, 9, 6, 7),
D1 = c(-1, 10, 6, -1, 8, 9, 7, -1, 99)
)
I don't know how the output may look like but I simply need a list like:
book 2
example or examples 2
nothing 1
etc...
Can you help?