Hi folks!

I've got a ML project where a whole host of classes (~200ish) are nested within each observation. Is there a way to tune the number of classes within this nested structure that get passed to the model? I can filter manually to just the top n, but if there's a more systematic way to do so that'd be more ideal!


train_nested <- 
  tibble(observation = rep(seq(1, 5), 50),
       classes = round(runif(250, 1, 200))) %>%
  mutate(classes = paste("class", classes)) %>%
  nest(data = classes) %>%
  mutate(other_pred = rnorm(5))

#> # A tibble: 5 x 3
#>   observation data              other_pred
#>         <int> <list>                 <dbl>
#> 1           1 <tibble [50 x 1]>    -0.0702
#> 2           2 <tibble [50 x 1]>     0.273 
#> 3           3 <tibble [50 x 1]>    -1.47  
#> 4           4 <tibble [50 x 1]>    -1.99  
#> 5           5 <tibble [50 x 1]>     1.12

train_nested %>%
  unnest(data) %>%
  count(classes) %>%
  arrange(desc(n)) %>%
  slice_head(n = 15) %>%
  mutate(classes = fct_reorder(classes, n)) %>%
  ggplot(aes(x = classes,
             y = n)) +
  geom_col() +

