data <- read.csv("C:/Users/shujo/Documents/AATwitter.csv", header = T, stringsAsFactors = FALSE, encoding = "UTF-8") summary(data) head(data) tail(data) Text <- data$text doc.text2<-Corpus(VectorSource(Text)) doc.text3 <- tm_map(doc.text2, content_transformer(tolower)) doc.text4 <- tm_map(doc.text3, removeWords, stopwords("english")) doc_matrix<-TermDocumentMatrix(doc.text4, control = list(removeNumbers = TRUE, removePunctuation=TRUE, stripWhitespace=TRUE, stemDocument = FALSE, bounds=list(local=c(2,Inf))))
Without a reprex it is hard to help out more here (FAQ: How to do a minimal reproducible example ( reprex ) for beginners). Could you offer one, or a set of data that replicates the error?
Looking at a recent post here on the same error message, it suggests this error message may occur when using the
tm package and when the text with "characters not recognized by the character encoding format".
Find out what encoding the file has (often issue when files were generated on for example Mac and then used on Windows or vice versa) and then specify that in R like so:
data = read.csv("data.csv", encoding="UTF-8")
Another option is to remove all special characters by using something like
Would love to see a reprex, or let us know what your solution is!
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.