I used the remove stopwords (dutch) function within tm but I saw that I really don't want one of the words that was removed by that function to be removed. Is there any exception I can put into this function or can I add that word again with some other function?
I also have another question. I have a textfile with 1015 lines filled with explanations. Within each line some words are used more then once and that is influencing my outcome of the function findassoc. Is there any way I can remove any duplicate words within the same line / document?
I hope someone can help me with this. Many thanks in advance.
text <- "Het verzamelen puur voor het genoegen was een nieuw fenomeen dat in de Republiek een hoge vlucht nam"
d <- unlist(strsplit(text, split=" "))
paste(d[-which(duplicated(d))], collapse = ' ')
#> [1] "Het verzamelen puur voor het genoegen was een nieuw fenomeen dat in de Republiek hoge vlucht nam"