i am trying to use lmer mixed model based to target encode my categorical variable. The target is binary "Disposition" and the categorical variable "RuleCombination" has more than 700 levels. The problem is that for some levels, i end up with two levels. For example the level "HRG" has the following proportion in terms of the target:
|Categorical Variable level| NFA|RFA|Grand Total|
|-- -|---|---|---|
|HRG| 8243| 85| 8328|
"NFA" and "RFA" are the levels of the binary outcome. After encoding i get the following embedded values:
|HRG|16.76811985|
|HRGDispostion|-9.285740011|
For the original HRG level i now get "HRG" and "HRGDisposition".
I don't know why this only happens for some of the levels.
data_mixed <-
recipe(Disposition ~ ., data = Data202101_train) %>%
step_lencode_mixed(
RuleCombination,
outcome = vars(Disposition),
) %>%
prep(training = Data202101_train)