Suggestions required

animeshsarraf · December 5, 2018, 11:00am

Hi All,
Im using R for text analysis to categorize text into different classes. There are certain cases where it predicts wrong results. Are there more efficient ways to get higher accuracy .
I have done the following in my code :
cleaning data, making tokens, stemming, ngrams ( 1-3) .Also used SVD and cosine similarity
Any specific methods or functions which can be used to improve results ?

technocrat · December 5, 2018, 9:35pm

The community can help you better with more information, especially if you include detail about your set-up with sessionInfo() and representative code -- see FAQ: What's a reproducible example (`reprex`) and how do I do one?

That said, NLP is challenging, even among native speakers in person! BTW: take a look at tidytext package. It has some good tools for data prep and basic analysis.

EconomiCurtis · December 6, 2018, 10:29am

As more of an introduction, the following chapter of TidyText on topic modeling introduces this idea,

A kaggle tutorial:

A datacamp tutorial

Ideally, you'll have samples you've pre-classified. That way you can measure the effectiveness of various methods and -- if this step was done carefully -- infer the relative strength of these methods on new data.

system · December 27, 2018, 10:41am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.