Im using R for text analysis to categorize text into different classes. There are certain cases where it predicts wrong results. Are there more efficient ways to get higher accuracy .
I have done the following in my code :
cleaning data, making tokens, stemming, ngrams ( 1-3) .Also used SVD and cosine similarity
Any specific methods or functions which can be used to improve results ?
The community can help you better with more information, especially if you include detail about your set-up with
sessionInfo() and representative code -- see FAQ: What's a reproducible example (`reprex`) and how do I do one?
That said, NLP is challenging, even among native speakers in person! BTW: take a look at
tidytext package. It has some good tools for data prep and basic analysis.
As more of an introduction, the following chapter of TidyText on topic modeling introduces this idea,
A kaggle tutorial:
A datacamp tutorial
Ideally, you'll have samples you've pre-classified. That way you can measure the effectiveness of various methods and -- if this step was done carefully -- infer the relative strength of these methods on new data.