Error rate in text corpus

Dear Community,

I am trying to analyze comments from different channels for my thesis. At this stage, I want to find error rate in the comment text, that is to cater for incomplete words, grammatical errors, etc.. so as I can conclude on which channel, users are more cautious about their writing?

Any idea how I can do this? Thanks in advance.

The first thing needed is a metric for errors—for example, spelling. There are tools to do this in {tidytext} and other NLP packages. Spelling is only an approximate metric. A correctly spelled word may be a misusage in context

The school's principle was not one to stand on principal. 

It can get fairly convoluted.

Another consideration is that some channels, such as Twitter, may more resemble speech than writing by mutual agreement. That makes ear spelling more common.

okay thanks, will look into it

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.