I recently saw a textbook about the tidymodels package. I want to use the colon dataset in the survival package for testing, but I don't know why I encountered a problem.
My code is as follows
library(tidymodels) library(survival) data(colon) str(colon) colon$sex <- ifelse(colon$sex==1,"male","female") colon$obstruct <- ifelse(colon$obstruct ==1,"yes","no") colon$perfor <- ifelse(colon$perfor ==1,"yes","no") colon$adhere <- ifelse(colon$adhere ==1,"yes","no") colon$status <- ifelse(colon$obstruct ==1,"death","alive") colon$node4<- ifelse(colon$node4 ==1,"yes","no") colon <- select(colon,id,age,rx,sex,age,obstruct,perfor,adhere,nodes,status) colon <- na.omit(colon) data_split<- initial_split(colon, prop = 3/4, strata = status) train_data <- training(data_split) test_data<- testing(data_split) str(train_data) train_rec <- recipe(status ~., data = train_data) %>% update_role(id, new_role = "ID")%>% step_zv(all_numeric(),-all_outcomes()) %>% step_normalize(all_numeric(),-all_outcomes())%>% step_novel(all_nominal(),-all_outcomes()) %>% step_dummy(all_nominal(),-all_outcomes()) summary(train_rec) prepped_data <- train_rec %>% # use the recipe object prep() %>% # perform the recipe on training data juice() # extract only the preprocessed dataframe glimpse(prepped_data) set.seed(100) cv_folds <- vfold_cv(train_data, v = 5, strata = status) log_spec <- # your model specification logistic_reg() %>% # model type set_engine(engine = "glm") %>% # model engine set_mode("classification") # model mode log_wflow <- # new workflow object workflow() %>% # use workflow function add_recipe(train_rec) %>% # use the new recipe add_model(log_spec) # add your model spec log_res <- log_wflow %>% fit_resamples( resamples = cv_folds, metrics = metric_set( precision, f_meas, accuracy, kap, roc_auc, sens, spec), control = control_resamples( save_pred = TRUE) ) log_res$.notes
Is there a few things I don't quite understand, or is it a bug in this package?
- Why does my model not fit? I think it may be a problem with the recipe step.
- I already have the ID variable defined, why does step_normalize also normalize it?
- For the binary variable of gender, how does the sex_new that appears after dummy explain?