I am just wondering whether there is any functionality to do the median imputation within a group. For example, if I have an income column and a zip code column, I want to impute the income within one zip code.
We don't have that. You would probably get something similar to what you want using step_impute_bag()
or step_impute_linear()
.
I'm not sure. Can you tell us what you want to do by showing the recipe (without the imputation parts)?
Here is the recipe without imputation. division and super_department are categorical variables.
preproc = recipe(
cpc ~ .,
data = lag_df) %>%
step_integer(division, super_department) %>%
step_normalize(recipes::all_predictors(), -division, -super_department) %>%
step_zv(recipes::all_predictors()) %>%
prep()
rf = rand_forest() %>%
set_mode("regression") %>%
set_engine("ranger", num.threads = 30, importance = "impurity")
wflow <-
workflow() %>%
add_recipe(preproc) %>%
add_model(rf)
I would put the imputation parts first. However, based on your first message, I'm not sure what you are imputing.
Also:
If you don't mind me asking, what is the purpose of this? These seem like unordered categorical data and splitting on them as if they are numeric scales would be bad.
I have a few more features with type double to impute which is not listed in the recipe.
step_integer(division, super_department)
is something I don't like but not sure how to handle it in another way. The number of distinct values for those variables is high and I will be out of memory easily if I do step_dummy. I probably can use step_other.
I suggest using step_lencode_mixed()
to convert them to a numeric feature (each)
Overall, I'd put the imputation steps first.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.
Thanks, Max! When using step_impute_linear(), if I have a categorical variable and I want to do step_integer for the model and step_dummy for step_impute_linear, is there any way to do that?