Log parameter values and prediction in plumber API

friesewoudloper · July 9, 2021, 7:34pm

I'm new to plumber and would really appreciate your advice on the following: We want to use plumber and RStudio Connect to publish a predictive model through a REST API. For every request made, the REST service needs to log the input values and prediction. What would be the best way to do that? Create a database connection and write a new record to a table in a database for each request or could we may be use a filter-logger?

meztez · July 9, 2021, 11:11pm

I would suggest monitoring the R process itself, then you can log plumber input by writing to R console.

This is what we use internally (we deploy using docker with kubernetes). Log are redirected to stackdriver on GCP.

Docker file

FROM rstudio/r-base:4.1.0-focal

WORKDIR /src

RUN apt-get update && apt-get install -y --no-install-recommends \
  gdal-bin \
  git \
  libcurl4-openssl-dev \
  libgdal-dev \
  libgeos-dev \
  libicu-dev \
  libproj-dev \
  libsodium-dev \
  libssl-dev \
  libudunits2-dev \
  make \
  zlib1g-dev
  
RUN echo 'options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"))' >> ~/.Rprofile

RUN Rscript -e "install.packages(c('curl','data.table','generics','globals','googleCloudStorageR','parsnip','remotes','sf','tidyr','yaml','future','hms','plumber', 'xml2'))"

COPY ./api/startup.R /etc

COPY ./api/plumber.R .

# Compile xgboost without OPENMP supports to avoid nthreads problems
RUN apt-get -qq -y install cmake
RUN git clone --recursive https://github.com/dmlc/xgboost
WORKDIR /src/xgboost
RUN git submodule init
RUN git submodule update
RUN mkdir build
WORKDIR /src/xgboost/build
RUN cmake .. -DR_LIB=ON -DUSE_OPENMP=OFF
RUN make -j$(nproc)
RUN make install

WORKDIR /src

EXPOSE 8004
ENTRYPOINT ["R", "-f", "/etc/startup.R", "--slave"]

api/plumber.R

#* Health check
#* @get /
#* @serializer unboxedJSON
function() {
    list(status = "OK")
}

api/startup.R (logging happen here)

library(plumber)

pr <- plumb()
postroute = function(req) {
  if (req$REQUEST_METHOD == "POST") {
      cat("[", req$REQUEST_METHOD, req$PATH_INFO, "] - REQUEST - ", req$postBody, "\n", sep = "")
  }
}

postserializewithoutpayload <- function(req, res) {
  if (req$REQUEST_METHOD == "POST") {
    cat("[", req$REQUEST_METHOD, req$PATH_INFO, "] - RESPONSE - ", res$status, "\n", sep = "")
  }
}

postserializewithpayload <- function(req, res) {
  if (req$REQUEST_METHOD == "POST") {
    cat("[", req$REQUEST_METHOD, req$PATH_INFO, "] - RESPONSE - ", res$status, " - BODY - ", res$body, "\n", sep = "")
  }
}

hooklist <- list(postserialize = postserializewithoutpayload)
debughooklist <- list(postserialize = postserializewithpayload, postroute = postroute)

if (Sys.getenv("DBG_ENABLE", FALSE) == TRUE) {
  pr$setDebug(TRUE)
  pr$registerHooks(debughooklist)
} else {
  pr$registerHooks(hooklist)
}

pr$run(host = "0.0.0.0", port = 8004)

There are of course other ways to achieve a similar result. Let me know if that gives you enough to get started.

friesewoudloper · July 10, 2021, 3:32pm

Thank you @meztez! Your reply was really helpful!
I've created two files, namely plumber.R

library(plumber)

data <- iris[, c("Sepal.Length", "Sepal.Width")]
names(data) <- c("length", "width")
model <- lm(length ~ width, data)

#* Predict sepal length
#* @param width
#* @get /predict
#* @serializer unboxedJSON
function(width){
  new_obs <- tibble::tibble(width = as.numeric(width))
  predict(model, new_obs)
}

and entrypoint.R

pr(file = "plumber.R") %>%
  pr_hook("postroute", function(req) { 
    if (paste0(req$REQUEST_METHOD, req$PATH_INFO) == "GET/predict") {
      cat("[", req$REQUEST_METHOD, req$PATH_INFO, "] - REQUEST - ", req$postBody, "\n", sep = "")
    }  
  }) %>%
  pr_hook("postserialize", function(req, res){
    if (paste0(req$REQUEST_METHOD, req$PATH_INFO) == "GET/predict") {
      cat("[", req$REQUEST_METHOD, req$PATH_INFO, "] - RESPONSE - ", res$status, " - BODY - ", res$body, "\n", sep = "")
    }
  }) %>%
  pr_run()

The postserialize hook is working fine, but the postroute isn't. The value of the input variable width isn't included in the output:

[GET/predict] - REQUEST -

Could you please point out to me what I'm doing wrong?

meztez · July 11, 2021, 1:15pm

there is no postBody with the GET as all parameters are passed after the ? in the URL. You might want to log something else like req$args or req$QUERY_STRING. See Routing & Input • plumber

friesewoudloper · July 12, 2021, 9:40am

Thank you so much! Now it works, thanks to your suggestions and this example: Plumber Logging · R Views

plumber.R

library(plumber)

data <- iris[, c("Sepal.Length", "Sepal.Width")]
names(data) <- c("length", "width")
model <- lm(length ~ width, data)

#* Predict sepal length
#* @param width
#* @get /predict
#* @serializer unboxedJSON
function(width){
  new_obs <- tibble::tibble(width = as.numeric(width))
  predict(model, new_obs)
}

entrypoint.R

library(plumber)
library(logger)

log_dir <- "logs"
if (!fs::dir_exists(log_dir)) fs::dir_create(log_dir)
log_appender(appender_tee(tempfile("plumber_", log_dir, ".log")))

convert_empty <- function(string) {
  if (string == "") {
    "-"
  } else {
    string
  }
}

pr(file = "plumber.R") %>%
  pr_hook("postserialize", function(req, res){
    if (paste0(req$REQUEST_METHOD, req$PATH_INFO) == "GET/predict") {
      log_info('width = {convert_empty(req$argsQuery$width)} prediction = {convert_empty(res$body)}')
    }
  }) %>%
  pr_run()

system · July 19, 2021, 9:40am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.