How to combine multiple NDJSON files into one?

I am trying to combine multiple NDJSON files into one. The NDJSON file names are "FileName_1", "FileName_2", "FIleName_3", etc. I've tried to build a file list by using list.files, and then use stream_in as a function in lapply like the codes below, but I got the error "invalid 'description' argument". But if I use lapply(list_files, stream_in(file())), then I got the error "'stream_in(file())' is not a function, character or symbol". What should be the best way to combine multiple NDJSON files into one? Thanks.

#> 
ABC=list.files(pattern="File_Name_*", full.name=TRUE)
Logs=lapply(list_files, stream_in(file(ABC)))
files = do.call(rbind,Logs)

You could use purrr::map_dfr() instead, I can't test the code since you are not providing sample data but here is an example.

library(jsonlite)
library(tidyverse)

list.files(pattern="File_Name_*", full.name=TRUE) %>% 
    map_dfr(~stream_in(file(.x)))

@andresrcs: Thank you for your reply. I tried your query and unfortunately I still got an error "Argument 5 can't be a list containing data frames". I am not sure what argument 5 refers to here.

I think the problem is in this part stream_in(file(.x)), I only have tested this approach with url sources, I think this is going to depend on the actual format of your files.

Thanks again. If you want to get a sample file, feel free to download it from Yelp at https://www.yelp.com/dataset. I tried the query on Yelp's NDJSON file data and got the same error. I basically copied and pasted the file "Business.JSON" twice and named them as Business1.JSON and Business2.JSON. Then, I used the following code to try to read and combine them. It seemed that R could read the files but couldn't combine them because I got the error "Argument 12 can't be a list containing data frames"...

B=list.files(pattern="business*", full.name=TRUE) %>%   
  map_dfr(~stream_in(file(.x)))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.