I am doing a LSTM-analysis for tweets and facing the following issue:
I want to replace the words in a data frame with the numeric value of the word-frequency of every word.
Therefore I used the following code:
prof.tm<-unnest_tokens(twitter, word, text)
word.freq<-prof.tm %>% count(word, sort = TRUE)
select(nr, word) %>%
tweet <- twitter$text
tweettxt <- data.frame(
stringsAsFactors = F,
tweetwords = (strsplit(tweet," ")[])
tweetnum <- tweettxt %>%
mutate (n = ifelse(is.na(n),0,n),
nr = ifelse(is.na(nr),Inf,nr))
tweetchar = paste("[",tweetnum$nr,"]",sep='',collapse = ' ')
Do you know how I can use this code for every tweet in the dataset and not only for one tweet?
And how can I create a dataset of the results and not only values?
I hope I could clarify my point and looking forward for every help!