How to get confidence values for each record in a classification model?

Hi all,

I've had some serious success using tidy principles in my text classification project. Following some guides I've been able to produce a classification model that has some pretty strong performance ( > .8 on sensitivity, specificity, recall, and precision). I've gotten as far as creating my predictors / features and putting them into a recipe and juicing it. I've been using these resources:


Here is a truncated version of the R code that shows the model definition:

#cross-validation object
folds <- vfold_cv(train)

#declare a RF classification model
rf_spec <- rand_forest( trees = 500 ) %>%
  set_mode("classification") %>%
  set_engine("ranger")
rf_spec

#build a 'workflow' by passing the model and the recipe
svm_wf <- workflow() %>%
  add_recipe(preprocessing_recipe) %>%
  add_model(svm_spec)
svm_wf

#fit the model!
svm_rs <- fit_resamples(
  svm_wf,
  folds,
  metrics = metric_set(recall, precision, sensitivity, specificity, accuracy),
  control = control_resamples(save_pred = TRUE)
)
svm_rs

So after defining this model and fitting it, I am able to use it to classify my text! I feel great about the performance metrics so far and am working on tuning my model. But here's what I really want to know:

How can I report the 'goodness of fit' for each record? Or in other words is there a way to know how well a record matches the given classification?

For example, if the model labels a text record as "positive" based on the features / predictors... how can I describe this particular record's fit to the "positive" class? In conventional statistics there are confidence values, intervals, p values, and so on. Any advice or resources would be helpful, thank you.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.