Hi Posit Community.
Do the preprocessing steps from a recipe that are then included as a tidymodels workflow impact the request body of a plumber API? I believe the answer is no based on current testing and the reprex below, but I wanted to confirm if what I am experiencing currently is expected behavior. My request body inputs may be null/missing at times for a variety of reasons, and I was hoping that the recipe part of the model would correct this on the fly in the plumber API.
In the event what I'm experiencing currently is expected behavior, what are common recommendations for imputing null/missing values for API requests so that the model generates a prediction response rather than a 500 error when a value is missing? Is the fix as easy as converting the request body to a data frame and prepping that data with the recipe from the modeling workflow before passing it to the predict()
function?
# Load Libraries ----------------------------------------------------------
library(plumber)
library(tidymodels)
library(tidyverse)
# Construct Basic Model ---------------------------------------------------
# Load and split data
df = mtcars
train_df = df[1:25, ]
test_df = df[26:32, ]
train_df$disp[1:5] = NA
train_df$cyl[1:5] = NA
# Define Recipe
mod_rec = recipe(mpg ~ cyl + disp + hp, data = train_df) %>%
step_impute_median(all_numeric_predictors())
prep(mod_rec, verbose = TRUE)
#> oper 1 step impute median [training]
#> The retained training set is ~ 0 Mb in memory.
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 3
#>
#> ── Training information
#> Training data contained 25 data points and 5 incomplete rows.
#>
#> ── Operations
#> • Median imputation for: cyl, disp, hp | Trained
# Define Model
tree_mod = decision_tree() %>%
set_mode("regression") %>%
set_engine("rpart")
# Define Workflow
tree_wkflow = workflow() %>%
add_recipe(mod_rec) %>%
add_model(tree_mod)
# Fit Model
mod1 = fit(tree_wkflow, train_df)
saveRDS(mod1, file = "cars.rds")
# API ---------------------------------------------------------------------
trained_mod = readRDS("cars.rds")
#* How many mpg should we expect?
#* @post /predict_mpg
function(req, res) {
predict(trained_mod, new_data = as.data.frame(req$body))
}
#> function(req, res) {
#> predict(trained_mod, new_data = as.data.frame(req$body))
#> }
# Update UI
#* @plumber
function(pr) {
pr %>% pr_set_api_spec(yaml::read_yaml("cars_yml.yml"))
}
#> function(pr) {
#> pr %>% pr_set_api_spec(yaml::read_yaml("cars_yml.yml"))
#> }
Created on 2023-05-18 with reprex v2.0.2
Session info
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.3.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.5.0 readr_2.1.2
#> [4] tidyverse_1.3.2 yardstick_1.1.0.9001 workflowsets_1.0.0
#> [7] workflows_1.1.3 tune_1.1.1 tidyr_1.3.0
#> [10] tibble_3.2.1 rsample_1.1.1 recipes_1.0.5
#> [13] purrr_1.0.1 parsnip_1.1.0 modeldata_1.0.0
#> [16] infer_1.0.2 ggplot2_3.4.2 dplyr_1.1.1
#> [19] dials_1.2.0 scales_1.2.1 broom_1.0.0
#> [22] tidymodels_1.0.0 plumber_1.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.5.2 lubridate_1.9.2 httr_1.4.3
#> [4] DiceDesign_1.9 tools_4.2.2 backports_1.4.1
#> [7] utf8_1.2.3 R6_2.5.1 rpart_4.1.19
#> [10] DBI_1.1.3 colorspace_2.1-0 nnet_7.3-18
#> [13] withr_2.5.0 tidyselect_1.2.0 compiler_4.2.2
#> [16] rvest_1.0.2 cli_3.6.1 swagger_3.33.1
#> [19] xml2_1.3.3 digest_0.6.31 rmarkdown_2.14
#> [22] webutils_1.1 pkgconfig_2.0.3 htmltools_0.5.3
#> [25] parallelly_1.35.0 lhs_1.1.6 dbplyr_2.2.1
#> [28] fastmap_1.1.0 highr_0.9 readxl_1.4.0
#> [31] rlang_1.1.0 rstudioapi_0.13 generics_0.1.3
#> [34] jsonlite_1.8.4 googlesheets4_1.0.0 magrittr_2.0.3
#> [37] Matrix_1.5-1 Rcpp_1.0.10 munsell_0.5.0
#> [40] fansi_1.0.4 GPfit_1.0-8 lifecycle_1.0.3
#> [43] furrr_0.3.1 stringi_1.7.12 yaml_2.3.5
#> [46] MASS_7.3-58.1 grid_4.2.2 parallel_4.2.2
#> [49] listenv_0.9.0 promises_1.2.0.1 crayon_1.5.2
#> [52] lattice_0.20-45 haven_2.5.0 splines_4.2.2
#> [55] hms_1.1.1 knitr_1.39 pillar_1.9.0
#> [58] future.apply_1.10.0 codetools_0.2-18 reprex_2.0.2
#> [61] glue_1.6.2 evaluate_0.15 modelr_0.1.8
#> [64] data.table_1.14.8 tzdb_0.3.0 vctrs_0.6.1
#> [67] foreach_1.5.2 cellranger_1.1.0 gtable_0.3.3
#> [70] future_1.32.0 assertthat_0.2.1 xfun_0.31
#> [73] gower_1.0.1 prodlim_2023.03.31 later_1.3.0
#> [76] googledrive_2.0.0 class_7.3-20 survival_3.4-0
#> [79] gargle_1.2.0 timeDate_4022.108 iterators_1.0.14
#> [82] hardhat_1.3.0.9000 lava_1.7.2.1 timechange_0.2.0
#> [85] globals_0.16.2 ellipsis_0.3.2 ipred_0.9-14