Can we optain VIP()- variable importance from lightGbm or catBoost the tidy-way?

I'm thinking of Julia Sigles example here:

There seems to be a generic one and a custom one- over the model-specific function predict().
Both with the tidymodels standard variable importance package VIP.

Reproduceable example (generic case, with a simple linear model):
`
library(lightgbm)

if (require("lightgbm")) {
library(AmesHousing)
library(janitor)
library(dplyr)
library(ggplot2)
library(rsample)
library(recipes)
library(parsnip)
library(tune)
library(dials)
library(workflows)
library(yardstick)
library(catboost)
library(tidymodels)
library(treesnip)
library(tidyverse)
library(vip)
info<-(.packages())
print(info)
}

set.seed(1234)

ames_data <- make_ames() %>%
janitor::clean_names()

ames_split <- rsample::initial_split(
ames_data,
prop = 0.8,
strata = sale_price
)

rec <- recipes::recipe(sale_price ~ ., data = training(ames_split)) %>%

recipes::step_other(all_nominal(), threshold = 0.01) %>%

recipes::step_nzv(all_nominal())

myPrep <- prep(rec)

spec <- linear_reg() %>%
set_mode("regression") %>%
set_engine("lm")

vipFit<-spec %>%
set_engine("lm") %>%
fit(sale_price ~ .,data = juice(myPrep))

impObj <- vipFit %>% vip::vi(scale=FALSE)

print(impObj)
`
Output:

A tibble: 193 x 3

Variable Importance Sign

1 second_flr_sf 11.5 POS
2 first_flr_sf 10.4 POS
3 misc_val 10.1 NEG
4 overall_qualVery_Excellent 9.04 POS
5 overall_qualExcellent 8.56 POS
6 neighborhoodStone_Brook 8.32 POS
7 neighborhoodNorthridge 7.51 POS
8 overall_condGood 6.84 POS
9 neighborhoodNorthridge_Heights 6.51 POS
10 overall_condVery_Good 6.44 POS

... with 183 more rows

Reproduceable example (custom case, with a simple linear model via pred_wrapper):

rec <- recipes::recipe(sale_price ~ ., data = training(ames_split)) %>%
recipes::step_other(all_nominal(), threshold = 0.01) %>%
recipes::step_nzv(all_nominal())

spec <- linear_reg() %>%
set_mode("regression") %>%
set_engine("lm")

final_model <- spec
final_wf <- workflow() %>%
add_model(final_model) %>%
add_recipe(rec)

final_fitted_wf <- final_wf %>% fit(data=training(ames_split))

myVip<-final_fitted_wf %>%
extract_fit_parsnip() %>%
vip(pred_wrapper = stats::predict, train = training(ames_split))

print(myVip)

Output

If anybody is interested in doing it without vip with catBoost/treesnip/tidymodels...

myModel<-extract_fit_engine(final_fitted_wf)
impObj <- catboost.get_feature_importance(myModel, 
  pool = NULL, 
  type = 'FeatureImportance',
  thread_count = -1)

Output (pretty ugly):
[,1]
ms_sub_class 1.24522954
ms_zoning 1.12307172
lot_frontage 2.94193678
lot_area 2.01862041
lot_shape 0.90817355
lot_config 0.35386407
[...]

PS: Suprise suprise...this also works with lightGbm/treesnip/tidymodels...

myModel<-extract_fit_engine(final_fitted_wf); impObj <- lgb.importance(myModel, percentage =FALSE)

Output (pretty fine):
Feature Gain Cover Frequency
1: myXreg32 28304.0115 39998 72
2: myXreg52 14347.0080 23272 41
3: myXreg31 10914.2301 34374 56
4: myXreg33 10746.1890 53054 96
5: myXreg7 10681.6466 30822 54
6: myXreg4 10121.8224 34187 60
7: myXreg46 9604.2502 30287 54
8: myXreg23 9591.9631 40724 72
9: myXreg6 9545.2380 47203 84
10: myXreg17 9544.5662 34146 58
[...]

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.