Tidymodels: Seeing the results of downsampling with Bake

Hi,

I was wondering is it possible to see the results of down sampling in Tidy models
In my example below i have an imbalanced data set, I create a recipe to down sample the majority class but when I bake it, i see the ratio between the classes has not changed. Can anyone help?

library(tidymodels)
library(tidyverse)

# Create a dataframe where we are trying to predict Setosa
mydf <- iris %>% 
  mutate(set_tgt = factor(ifelse(Species == 'setosa', 'yes', 'no'), levels = c('yes', 'no'))) %>% 
  select(-Species)

# Initial Table
table(mydf$set_tgt)

#> yes  no 
#>  50 100

# Try tune the correlation before removal and the sampling ratio
flower_rec <- recipe(set_tgt ~ ., data = mydf) %>%
  themis::step_downsample(set_tgt)

# Try to see the downsampling
p <- prep(flower_rec, new_data = mydf) 
test <- bake(p, new_data = mydf) 

# See the change in the label because of upsampling
table(test$set_tgt)
 
#> yes  no 
#>  50 100

Thanks

Down-sampling is intended to be performed on the training set alone. For this reason, the default is skip = TRUE . It is advisable to use prep(recipe, retain = TRUE) when preparing the recipe; in this way bake(recipe, new_data = NULL) can be used to obtain the down-sampled version of the data.

1 Like

Hi @Max,

Thank you very much for your reply.

I think the use of the word test is probably throwing things off a bit. Normally i just use it as a temporary variable when trouble shooting. Maybe temp would be better. In any case your solution is perfect and now i can see the 50-50 split.

Thanks again

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.