Deployment of Keras Models to Google Cloud - pre-processing question

Hello!

I have built a model in keras and deployed it through Google Cloud ML, using the cloudml package

The model is trained on numerical inputs which have been converted from strings

When It comes to making predictions, I convert the strings to numeric using a lookup table and then pass the numerical input to the model

This is simple to do on the local machine:

library(tidyverse)
library(cloudml)

# lookup table 
lookup <- tibble(int = c(1, 2, 3),
                 str = c('A1', 'B1', 'C1'))

# input strings
 a <- 'A1'
 b <- 'B1'

# convert to numeric 
a_ <- lookup %>% filter(str == a) %>% select(int) %>% pull()
b_ <- lookup %>% filter(str == b) %>% select(int) %>% pull()

# send to deployed model and receive predictions
cloudml_predict(
  instances = list(c(a_, b_)),
  name = "size_predictor",
  version = "a_1",
  verbose = T
)

However when it comes to full deployment, I can't work out where I need to put the lookup table. Ideally my website will send an input file to cloudML, containing strings. These strings will be converted to integers through the lookup table and be fed to the model. The model will then return the outputs

My question is, what is the best way method of creating this lookup step? Do I need to add another layer to the keras model at the beginning to do the conversion? Should I store the lookup table in BigQuery and divert inputs through this beforehand?

The answers I have found so far apply only to python, for example this on stack overflow: Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)

I am trying to have a full end to end (database -> deployment) project in R, I feel I am so close but there is just this gap left!

Google Cloud predictions currently only predict on TensorFlow models, so you have two options:

  1. Add strings as inputs to the Keras/TensorFlow model and use the graph to convert those strings into proper numeric columns.
  2. Use a proxy to convert the strings using the lookup and then pass the request to Google Cloud. You could use plumber to create this lookup API which would map and forward the request to Google Cloud.

Thanks for the reply javier, could you go into a bit more detail where you say:

What graph would this be? Does that mean the inputs will have to be stored within the model (there are a few million)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.