Using the tm package, I got a Term Document Matrix from a corpus of words. After using this code to get a dataframe:
dtm <- TermDocumentMatrix(tdocs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word=names(v), freq=v)
d25<- d[1:25,]
The data frame looks like this:
head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174
names(d25)
[1] "word" "freq"
What about those words in the leftmost side? I want to get a dataframe with only the word and freq columns. How do I get rid of those extra words? I've tried using the d <-select(d, word, freq) in dplyr , but the extra column comes back
head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174