recipe step_medianimpute within group

I am just wondering whether there is any functionality to do the median imputation within a group. For example, if I have an income column and a zip code column, I want to impute the income within one zip code.

We don't have that. You would probably get something similar to what you want using step_impute_bag() or step_impute_linear().

I'm not sure. Can you tell us what you want to do by showing the recipe (without the imputation parts)?

Here is the recipe without imputation. division and super_department are categorical variables.

preproc = recipe(
    cpc ~ ., 
    data = lag_df) %>% 
  step_integer(division, super_department) %>%
  step_normalize(recipes::all_predictors(), -division, -super_department) %>% 
  step_zv(recipes::all_predictors()) %>%
  prep()

rf = rand_forest() %>%
  set_mode("regression") %>%
  set_engine("ranger", num.threads = 30, importance = "impurity")

wflow <- 
  workflow() %>% 
  add_recipe(preproc) %>%
  add_model(rf)

I would put the imputation parts first. However, based on your first message, I'm not sure what you are imputing.

Also:

If you don't mind me asking, what is the purpose of this? These seem like unordered categorical data and splitting on them as if they are numeric scales would be bad.

I have a few more features with type double to impute which is not listed in the recipe.

step_integer(division, super_department)

is something I don't like but not sure how to handle it in another way. The number of distinct values for those variables is high and I will be out of memory easily if I do step_dummy. I probably can use step_other.

I suggest using step_lencode_mixed() to convert them to a numeric feature (each)

Overall, I'd put the imputation steps first.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Thanks, Max! When using step_impute_linear(), if I have a categorical variable and I want to do step_integer for the model and step_dummy for step_impute_linear, is there any way to do that?