Trouble with partial dependence profiles in Tidy Models with R

Going through TidyModels in R and got to a super cool section on partial dependence profiles. I am having trouble reproducing the code on partial dependence profiles, specifically the model_profile function from DALEX throws an error about loss of precision for the column the text wants to explain, Year_Built.

Due to the structure of this chapter, it uses code from previous chapters, making it a little challenging to piece together. Here is what I believe the whole code is, including the random forest model from chapter 10. It's the last line that is giving me trouble.

library(tidymodels)
library(DALEXtra)
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))

# split the data
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test  <-  testing(ames_split)

# recipe preprocessing and what is being predicted
ames_rec <- 
  recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + 
           Latitude + Longitude, data = ames_train) %>%
  step_log(Gr_Liv_Area, base = 10) %>% 
  step_other(Neighborhood, threshold = 0.01) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>% 
  step_ns(Latitude, Longitude, deg_free = 20)

# random forest
rf_model <- 
  rand_forest(trees = 1000) %>% 
  set_engine("ranger") %>% 
  set_mode("regression")

# workflow, add formula rather than recipe
# minimal to no preprocessing needed
rf_wflow <- 
  workflow() %>% 
  add_formula(
    Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + 
      Latitude + Longitude) %>% 
  add_model(rf_model) 

# normal fitting example
rf_fit <- rf_wflow %>% 
  fit(ames_train)

# isolate features
vip_features <- c("Neighborhood", "Gr_Liv_Area", "Year_Built", 
                  "Bldg_Type", "Latitude", "Longitude")
vip_train <- 
  ames_train %>% 
  select(all_of(vip_features))

# create a DALEX explainer
explainer_rf <- 
  explain_tidymodels(
    rf_fit, 
    data = vip_train, 
    y = ames_train$Sale_Price,
    label = "random forest",
    verbose = FALSE
  )

# this doesn't work on Year_Built
set.seed(1805)
pdp_age <- model_profile(explainer_rf, N = 500, variables = "Year_Built")

I end up getting this error:

> pdp_age <- model_profile(explainer_rf, N = 500, variables = "Year_Built")
Error in `stop_vctrs()`:
! Can't convert from `Year_Built` <double> to `Year_Built` <integer> due to loss of precision.
• Locations: 2, 3, 5, 13, 14, 49, 53, 72, 73, 75, 83, 84, 119, 123, 142, 143, 145, 153, 154, 189, 193, 212, ...
Run `rlang::last_error()` to see where the error occurred.

Finding it difficult to parse together the code from previous chapters. Any insights into why this code won't run?

Always learning,
Zach

I suspect that it is a bug in DALEXtra. Can you post an issue on their issues page?

In the meantime, you might try adding a mutate() to

and see if that works.

Seems to work with other columns like Latitude and Longitude, but the conversion of Year_Built doesn't seem to work.

Posted an issue.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.