Need to use jsonlite to handle ndjson message list using stream_in() and stream_out()


#1

I have an ndjson data source. For a simple example, consider a text file with three lines, each containing a valid json message. I want to extract 7 variables from the messages and put them in a dataframe.

Please use the following sample data in a text file. You can paste this data into a text editor and save it as "ndjson_sample.txt"

{"ts":"1","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":12353,\"Var5\":1,\"Var6\":\"abc\",\"Var7\":\"x\"}"}
{"ts":"2","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-68,\"Var4\":4528,\"Var5\":1,\"Var6\":\"def\",\"Var7\":\"y\"}"}
{"ts":"3","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":-5409,\"Var5\":1,\"Var6\":\"ghi\",\"Var7\":\"z\"}"}

The following three lines of code accomplish what I want to do:

file1 <- "ndjson_sample.txt"
json_data1 <- ndjson::stream_in(file1)
raw_df_temp1 <- as.data.frame(ndjson::flatten(json_data1$ct))

For reasons I won't get into, I cannot use the ndjson package. I must find a way to use the jsonlite package to do the same thing using the stream_in() and stream_out() functions. Here's what I tried:

con_in1 <- file(file1, open = "rt")
con_out1 <- file(tmp <- tempfile(), open = "wt")
callback_func <- function(df){
  jsonlite::stream_out(df, con_out1, pagesize = 1)
}
jsonlite::stream_in(con_in1, handler = callback_func, pagesize = 1)
close(con_out1)
con_in2 <- file(tmp, open = "rt")
raw_df_temp2 <- jsonlite::stream_in(con_in2)

This is not giving me the same data frame as a final output. Can you tell me what I'm doing wrong and what I have to change to make raw_df_temp1 equal raw_df_temp2?

I could potentially solve this with a the fromJSON() functions operating on each line of the file, but I'd like to find a way to do it with the stream functions. The files I will be dealing with a are quite large and so efficiency will be key. I need this to be as fast as possible.

Thank you in advance.