Hi Community,

I'm working on a binary text classification problem using the tagged packages of this post and it turns out the scrappy linear Support Vector Machine (SVM) is doing well.

As a next step I am hoping to extract the most "predictive" ngram features based on the positive and negative weights the linear SVM has assigned to them as per Chang & Lin (2008 link below) and then manually annotating a set of topic labels based on the ngrams that appear.

This would ideally be a query to extract the top X (e.g. 20) ngrams ordered by their SVM weights both positive and negative.

I'm currently working with Parsnip's General Interface for Polynomial SVMs and I understand that one can calculate the weights using Kernlab directly (but might not be a straightforward task... ): https://stackoverflow.com/questions/1899008/weights-from-linear-svm-model-in-r

Before digging deeper I was wondering if the community might be aware of any methods or classes I might use to perform the above task?

Thank you and have a nice day,

Kind regards,

Ben

Ref Chang & Lin 2008 (http://proceedings.mlr.press/v3/chang08a/chang08a.pdf)