How to get confidence values for each record in a classification model?

swansswansswans · December 27, 2020, 7:00pm

Hi all,

I've had some serious success using tidy principles in my text classification project. Following some guides I've been able to produce a classification model that has some pretty strong performance ( > .8 on sensitivity, specificity, recall, and precision). I've gotten as far as creating my predictors / features and putting them into a recipe and juicing it. I've been using these resources:

Here is a truncated version of the R code that shows the model definition:

#cross-validation object
folds <- vfold_cv(train)

#declare a RF classification model
rf_spec <- rand_forest( trees = 500 ) %>%
  set_mode("classification") %>%
  set_engine("ranger")
rf_spec

#build a 'workflow' by passing the model and the recipe
svm_wf <- workflow() %>%
  add_recipe(preprocessing_recipe) %>%
  add_model(svm_spec)
svm_wf

#fit the model!
svm_rs <- fit_resamples(
  svm_wf,
  folds,
  metrics = metric_set(recall, precision, sensitivity, specificity, accuracy),
  control = control_resamples(save_pred = TRUE)
)
svm_rs

So after defining this model and fitting it, I am able to use it to classify my text! I feel great about the performance metrics so far and am working on tuning my model. But here's what I really want to know:

How can I report the 'goodness of fit' for each record? Or in other words is there a way to know how well a record matches the given classification?

For example, if the model labels a text record as "positive" based on the features / predictors... how can I describe this particular record's fit to the "positive" class? In conventional statistics there are confidence values, intervals, p values, and so on. Any advice or resources would be helpful, thank you.

system · January 17, 2021, 7:00pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.