Dealing with JSON from plumber in python

I work in a company that does data science in R, and production dev in python. We have a crack team that sticks the two together, but recently the python people have been expressing some frustration over the JSON return from a plumber API.

The issue seems to be total dismay that while JSON differentiates scalar from vector objects, R does not. I've presented them with boxed versus unboxed serializations from a plumber API, but they remain unimpressed. I've got no problem dealing with the single element list vs not-a-list issue in R, so, to be honest, I don't really understand the problem from a python point of view.

To be fair, and not in my words:

I mean, it's not hard to deal with it possibly being a list vs. a scalar. But it contradicts the spirit of JSON and REST. And if we think about writing a guide for outside users you basically have to say that this element needs to be type-checked before it's interpreted. That means there's no "schema" to our JSON response; means clients have to implement some logic to get our output to a point where it's interpretable.

My problem is that there are many more of them than of me, and I'm slowly losing the battle in arguing the rational case for doing DS in R, and production dev in python.

My question is, what resources can I read to educate myself about the issue from a python point of view, and what can I pass on to the python people that will help them?

Like you, I'm a bit confused about what the exact issue is. But can they share couple of examples with what they get and what they expect? My immediate reaction is boxed vs unboxed, but you've tried this already, so I'm not sure what exactly is missing. At least, that is exactly what we did at my previous workplace and we were dealing with Java of all things, so even more strictness compared to Python.

P.S. "JSON schema" is an oxymoron, but that's offtopic :slight_smile:.

Thanks @mishabalyasin, they say they want consistency. I think they want unboxed eveywhere except for those lists which may have single or multiple elements, they must always be boxed, even for scalars.

We've looked at custom serializers, but the plumber docs just say //todo under that topic.

Scrolling back through conversations.
(Experimenting with boxed, previously everything was unboxed.)

... Right now I'm hating Plumber and R for (stuff) like this:

response_json={'status': ['precheck'], 'status_url': ['http://localhost:8010/maintenance/get_status?job_id=0b57236f']}

So now the "status" is a list for no reason.

I wonder if there's an unbox-sort of function in most JSON parsers.

So now we're trying boxed for some routes, and unboxed for others, which I think will lead to even more frustration. There is a lot of refactoring going on which isn't making me and DS exactly popular.

I sense a lot of misplaced frustration :slight_smile:. If there is something to hate, I would go with jsonlite and even then - it's just doing what you are asking from it.

But let's talk about actual issue at hand. Why do you get JSON that you get? I would divine that you are doing something like that:


response <- tibble::tribble(
  ~status, ~status_url,
  "precheck", "http://localhost:8010/maintenance/get_status?job_id=0b57236f"

jsonlite::toJSON(response, dataframe = "columns")
#> {"status":["precheck"],"status_url":["http://localhost:8010/maintenance/get_status?job_id=0b57236f"]}

So this is not what we want since our scary Python devs are saying: why on Earth would you put status and status_url into lists if they are clearly scalars? Well, they are not actually scalars. They are vectors of length 1 and that's how R+jsonlite is treating them.

So your next step is to try something like this since auto_unbox is supposed to flatten length-1 things into scalars:

jsonlite::toJSON(response, dataframe = "columns", auto_unbox = TRUE)
#> {"status":["precheck"],"status_url":["http://localhost:8010/maintenance/get_status?job_id=0b57236f"]}

It doesn't work again. Why would it not work? Well, jsonlite actually mentions how you should do it - by using unbox function explicitly to make sure to mark whatever you actually need as scalars as scalars (take a look at ?unbox).

But you could also trick jsonlite a bit and remember that JSON at it's core is almost exactly mapping to lists (list of lists, list of lists of lists etc.) in R. This means that you could do something like this:

jsonlite::toJSON(as.list(response), auto_unbox = TRUE)
#> {"status":"precheck","status_url":"http://localhost:8010/maintenance/get_status?job_id=0b57236f"}

Created on 2020-02-22 by the reprex package (v0.3.0)

And here you go, you are getting the response that (I assume) Python devs are expecting.

That all being said, you need to be careful with JSON and unboxing since it actually can produce surprising (and inconsistent) results. To avoid that, you should actually start a conversation about what exactly they want as a response. jsonlite is flexible enough to give you (almost) any structure you want/need, but it's on you to make sure that what you put into it is already in a form that naturally maps to proper response.


Thanks @mishabalyasin for taking the time to write out such a great reply.

So, to be clear, I need to bypass Plumber's JSON serialization with #* @html, and then, in general return boxed responses like this:

(where response is a list)

jsonlite::toJSON(response, auto_unbox = FALSE)

or where I specifically want unboxed, as you suggest, return like this:

jasonlite::toJSON(response, auto_unbox = TRUE)

You probably already seen this - - but I think it says exactly the same thing as I've written above + how to do it with Plumber specifically (especially 4.2.2). Does it not work for you with Plumber natively that you need to go down to jsonlite output directly?

Please say if I'm wrong, but I don't think one can achieve what I described with Plumber natively. I thought your example was showing me exactly how to take direct control of the json serialization. (For which I'm thinking yeah, I could have thought of that myself, but for which I'm also very grateful.)

As I understand Plumber, for a single defined route, if you want a json return, you get two options.

  1. Boxed json by default or with the @json annotation; equivalent to jsonlite::toJSON()

  2. Unboxed json with the @serializer unboxedJSON annotation; equivalent to jsonlite::toJSON(auto_unbox=TRUE)

But I want to choose which of these to use in multiple return() statements in a single defined route. Here is my scenario for why I want to do this:

(I've not tried to generalize this, it's easier just to use actual code.)

#* @html # text/html; no additional serialization
#* @param modelspec
#* @post /create_model
function(res, modelspec) {
    time_requested <- as.numeric(Sys.time())

    # We verify modelspec as early as possible ====
    precheck <- check_modelspec(modelspec)
    # returns a list with $test, $error_code, $error_msg, and $advice fields
    # precheck$test is a logical vector of tests passed
    if (!all(precheck$test)) {
        simple_err("Received a request for model_creation with a badly formed modelspec")
        failure <- list()
        failure$time_precheck_fail <- as.numeric(Sys.time())
        failure$error_code <- precheck$error_code[!precheck$test]
        failure$error_msg <- precheck$error_msg[!precheck$test]
        failure$advice <- precheck$advice[!precheck$test]
        res$status <- 400
        return(jsonlite::toJSON(failure, auto_unbox = FALSE))

So far I've done some validation of the modelspec object. If any tests have failed I populate a failure list. I want to return this as consistently boxed json, regardless of whether a single test failed, or multiple tests failed.

e.g. single test failed

> toJSON(failure, auto_unbox = FALSE)
{"time_precheck_fail":[1582421348.5493],"error_code":[["mpc:18"]],"error_msg":[["base_data must have no invariant valued columns"]],"advice":[["Check your modelspec$base_data argument. Columns [1, 2, 7] are invariant."]]}

multiple tests failed

> toJSON(failure, auto_unbox = FALSE)
{"time_precheck_fail":[1582421578.9645],"error_code":[["mpc:7"],["mpc:18"]],"error_msg":[["update must be a single boolean or a single int [0|1]"],["base_data must have no invariant valued columns"]],"advice":[["Check your modelspec$req_metadata argument."],["Check your modelspec$base_data argument. Columns [1, 2, 7] are invariant."]]}

If no tests failed then we drop through the above if condition. I now want to return an unboxed response because the elements here are always scalars, and the python devs don't have to deal with me saying "but they're actually vectors of length 1".

# some code populating a response object
# but it's always scalar so we want to return as unboxed json
res$status <- 202
return(jsonlite::toJSON(response, auto_unbox = TRUE))
> toJSON(response, auto_unbox = TRUE)

So to achieve this consistency for the dev team, I need to bypass Plumber's native serialization and take control of it myself. I thought that is what your example was Illustrating. Have I misunderstood?

Now I see what you mean. Yes, if you use @html as a serializer, you can even create your output with jsonlite directly and Plumber won't do anything to it. It's one way to do it, I suppose.

Last time I used Plumber, I don't remember it having @html, so I was more thinking that you could still use @json as a serializer with unbox tricks I've linked above. Concretely:

response <- tibble::tribble(
  ~status, ~status_url,
  "precheck", "http://localhost:8010/maintenance/get_status?job_id=0b57236f"

#> {"status":"precheck","status_url":"http://localhost:8010/maintenance/get_status?job_id=0b57236f"}

Created on 2020-02-23 by the reprex package (v0.3.0)

Except in your case you'd have return(jsonlite::unbox(response)).

Hope that makes sense.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.