How to use Surv() inside recipe for survival analysis data

I am trying to use Tidymodels for survival analysis and I want to use recipe for feature engineering before fit. I encountered the following error.

> base_rec <- recipe(Surv(SURD, DV) ~ ., data = fos_train) %>% 
+             step_rm(D:E)               %>% 
+             step_zv(all_predictors())  %>% 
+             step_nzv(all_predictors()) %>% 
+             step_other(A, B, C) 
Error: No in-line functions should be used here; use steps to define baking actions.
Run `rlang::last_error()` to see where the error occurred.

I also wonder why factor labels are lost after collapsing sparse levels

> fos_train %>% count(RACE)
# A tibble: 4 x 2
  RACE      n
  <fct> <int>
1 White   362
2 Black    14
3 Asian   154
4 Other    26
> base_rec %>% prep() %>% bake(NULL) %>% count(RACE)
# A tibble: 3 x 2
  RACE      n
  <fct> <int>
1 1       362
2 3       154
3 other    40

why I get 1 and 3 instead of White and Asian?

You can use:

recipe(SURD + DV ~ ., data = fos_train)

which designates SURD and DV as outcomes. The recipe definition does not allow functions to be used in recipe().

The censored package uses the formula method for survival models so you can use the call to Surv() there.

I also wonder why factor labels are lost after collapsing sparse levels

Hard to know without a reproducible example.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.