I am trying to analyze comments from different channels for my thesis. At this stage, I want to find error rate in the comment text, that is to cater for incomplete words, grammatical errors, etc.. so as I can conclude on which channel, users are more cautious about their writing?
The first thing needed is a metric for errors—for example, spelling. There are tools to do this in {tidytext} and other NLP packages. Spelling is only an approximate metric. A correctly spelled word may be a misusage in context
The school's principle was not one to stand on principal.
It can get fairly convoluted.
Another consideration is that some channels, such as Twitter, may more resemble speech than writing by mutual agreement. That makes ear spelling more common.