Hello,

I have a dataset with a categorical variable of 699 levels. I am predicting a binary response. I would like to encode a numeric variable with the mean per category level of the binary outcome. What is the best way to accomplish this? If possible, I would like to cross validate the predictor. Please see the following "attempt" with play data. A vignette or blog about this topic would be helpful!

```
library(tidyverse)
library(tidymodels)
set.seed(1)
dat<-data.frame(col_a=sample(letters,size = 10000,replace = TRUE),
col_b=sample(letters,size = 10000,replace = TRUE))
dat<-dat %>%
mutate(concat=paste(col_a,col_b,sep="-"))
set.seed(2)
y<-rbinom(n = 10000,size = 1,prob = .2)
dat$y<-y
concat_mean<-dat %>%
group_by(concat) %>%
summarise(concat_mean=mean(y))
dat<-left_join(dat,concat_mean)
dat$y<-as.factor(dat$y)
dat$imputed_mean<-NA
imputed_dat <-
recipe(y ~ ., data = dat) %>%
step_impute_linear(
imputed_mean,
impute_with = imp_vars(concat)
)
prep(imputed_dat)
```