Suggestions required


#1

Hi All,
Im using R for text analysis to categorize text into different classes. There are certain cases where it predicts wrong results. Are there more efficient ways to get higher accuracy .
I have done the following in my code :
cleaning data, making tokens, stemming, ngrams ( 1-3) .Also used SVD and cosine similarity
Any specific methods or functions which can be used to improve results ?


#2

The community can help you better with more information, especially if you include detail about your set-up with sessionInfo() and representative code -- see FAQ: What's a reproducible example (`reprex`) and how do I do one?

That said, NLP is challenging, even among native speakers in person! BTW: take a look at tidytext package. It has some good tools for data prep and basic analysis.


#3

As more of an introduction, the following chapter of TidyText on topic modeling introduces this idea,

A kaggle tutorial:
https://www.kaggle.com/rtatman/nlp-in-r-topic-modelling

A datacamp tutorial


Ideally, you'll have samples you've pre-classified. That way you can measure the effectiveness of various methods and -- if this step was done carefully -- infer the relative strength of these methods on new data.