Data of the levels that were collapsed into step_other

I'm trying to use the recipe package for database management and would like to know if there is any way to get levels that have been collapsed into step_other.


library(tidyverse)
library(tidymodels)

data("ames")

ames %>% 
  summarise(lvl_neighb = n_distinct(Neighborhood),
            lvl_exterior = n_distinct(Exterior_1st)) 

rec_ames <- 
  recipe(Sale_Price ~., data = ames) %>% 
  step_other(c(Neighborhood,Exterior_1st), threshold = 0.01) %>% 
  prep()

rec_ames %>% 
  juice() %>% 
  summarise(lvl_neighb= n_distinct(Neighborhood),
            lvl_exterior= n_distinct(Exterior_1st))

#levels that have been kept in the database
rec_ames$steps[[1]]$objects$Neighborhood$keep

I would like to print a report showing the count and names of records that are currently not being used (levels that have collapsed). Has anyone ever done this?

Thank you so much!

kept_levels <- rec_ames$steps[[1]]$objects$Neighborhood$keep
original_levels <- levels(ames$Neighborhood)
setdiff(original_levels,kept_levels)
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.