I will go ahead and post what I was able to come up with to solve my issue. This is only half of what I need to accomplish. I have recently posted another thread to ask for help on my other issue, which is adding my document term matrix to an XGBoost classifier. Here is the code that I used for the importing and cleaning of my Twitter dataset:
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
setwd('C:/rscripts/random_forest')
dataset = read.csv('tweets_all.csv', stringsAsFactors = FALSE)
library(tm)
corpus <- iconv(dataset$text, to = "utf-8")
corpus <- Corpus(VectorSource(corpus))
inspect(corpus[1:5])
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
cleanset <- tm_map(corpus, removeWords, stopwords('english'))
removeURL <- function(x) gsub('http[[:alnum:]]*', '', x)
cleanset <- tm_map(cleanset, content_transformer(removeURL))
cleanset <- tm_map(cleanset, stripWhitespace)
cleanset <- tm_map(cleanset, removeWords, c('Ã\u009dhillary','ââ¬Å¾Ã','ââ¬Å¡Ã','just','are','all','they'))
tdm <- TermDocumentMatrix(cleanset)
tdm
tdm <- as.matrix(tdm)
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''