SHAP visuliazation package of tidymodels ecosystem?

Is there an R package for SHAP visualization compatible with tidymodels? I have tried SHAPforxgboost, fastshap, and shapviz. Due to the ML model development is based on tidymodels grammar, I don't know how to use these packages with the tidymodels object. Is there any plan for developing an model explanation tool for tidymodels?

It depends on the model type. For general models, you can work (my) model agnostic kernelshap package to calculate SHAP values and then plot them via (my) shapviz package:

library(tidymodels)
library(kernelshap)
library(shapviz)

iris_recipe <- iris %>%
  recipe(Sepal.Length ~ .)

reg <- linear_reg() %>%
  set_engine("lm")

iris_wf <- workflow() %>%
  add_recipe(iris_recipe) %>%
  add_model(reg)

fit <- iris_wf %>%
  fit(iris)

shap <- kernelshap(fit, iris[, -1], bg_X = iris) %>% 
  shapviz()

sv_importance(shap, kind = "bee")
sv_dependence(shap, "Petal.Length")

beeRplot

If your model is fitted with XGBoost/LightGBM backend, no "kernelshap" package is required. Then, "shapviz" will suffice. I'd need a reproducible model example from your side to provide a solution.

Thank you so much for your reply. Here is the code of xgboost and lightGBM modeling using tidymodels.

library(tidyverse)
library(bonsai)
library(tidymodels)

set.seed(123)
split <- initial_split(iris, prop = 0.7,strata = Species)

train <- training(split)
test  <- testing(split)
cv <- vfold_cv(train, strata = Species,v = 10)

model_recipe <- 
  recipe(Species ~ ., data = train)

# xgboost
xgboost_model <- 
  boost_tree( mode = "classification",
              mtry = 5,
              trees = 1000,
              min_n = 4,
              tree_depth = 5,
              learn_rate = 0.05,
              sample_size = 0.7,
              engine = "xgboost"
  )

xgboost_wf <-
  workflow() %>%
  add_model(xgboost_model) %>% 
  add_recipe(model_recipe) %>%
  fit(train)


# lightGBM
lgbm_model <- 
  boost_tree( mode = "classification",
              mtry = 3,
              trees = 500,
              min_n = 15,
              tree_depth = 5,
              learn_rate = 0.03,
              loss_reduction = 0,
              engine = "lightgbm"
  )

lgbm_wf <-
  workflow() %>%
  add_model(lgbm_model) %>% 
  add_recipe(model_recipe) %>%
  fit(train)

I don't know how to combine the shapviz package with the tidymodels object. Can you show the detailed code for visualizing the force plot, variable importance plot, and dependence plot based on the code above?

Also the multilayer perceptron model in tidymodels.

mlp_model <- 
  mlp(mode = "classification",
      hidden_units = 8,
      penalty = 0.3,
      epochs = 500
      engine = "nnet"
  )

mlp_wf <-
  workflow() %>%
  add_model(mlp_model) %>% 
  add_recipe(model_recipe) %>%
  fit(train)

Unfortunately, I don't see a direct way to extract TreeSHAP values from XGBoost.

Edit: Here is an example of how to do it

https://lorentzen.ch/index.php/2023/01/27/shap-xgboost-tidymodels-love/

For the MLP (and actually any other model that predicts numbers), you can work with my new package "kernelshap". kernelshap() returns SHAP values for all three categories, but to plot them with "shapviz", we need to focus on one category.

library(tidyverse)
library(bonsai)
library(tidymodels)

set.seed(123)
split <- initial_split(iris, prop = 0.7,strata = Species)

train <- training(split)
test  <- testing(split)
cv <- vfold_cv(train, strata = Species,v = 10)

model_recipe <- 
  recipe(Species ~ ., data = train)

mlp_model <-
  mlp(mode = "classification",
      hidden_units = 5,
      penalty = 0.3,
      epochs = 50,
      engine = "nnet"
  )

mlp_wf <-
  workflow() %>%
  add_model(mlp_model) %>% 
  add_recipe(model_recipe) %>%
  fit(train)

library(kernelshap)
library(shapviz)
library(withr)

with_seed(
  1,
  background_data <- train[sample(nrow(train), 50), ]
)

predict(mlp_wf, head(train, 1), type = "prob")
# 
# .pred_setosa .pred_versicolor .pred_virginica
# <dbl>            <dbl>           <dbl>
#   1        0.564            0.221           0.215

# List with SHAP value matrices (one matrix per class)
shap_values <- kernelshap(mlp_wf, train, bg_X = background_data, type = "prob")

# Turn into shapviz -> select "virginica"
sv <- shapviz(shap_values, which_class = 3)

sv_importance(sv, kind = "bee")
sv_dependence(sv, "Sepal.Width", color_var = "auto")
sv_force(sv, row_id = 1)

force
dep
bee

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.