Loading 20 GB Json file at R :)

Hady · March 7, 2018, 10:28am

Dear R Geeks,
Frankly i'm a beginner with R and appreciate your continuous support. Can i load a large json file like 20 GB at R while my laptop is only 8 GB RAM. ? Have an idea that i will do a project of segmentation of customer history on this file after i load it on a database. Appreciate your usual support my friends . Sorry for writing the question without showing trials of my work
Regards,

martin.R · March 7, 2018, 10:35am

You will need to use a database.

The data needs to fit into RAM when loaded into R and have some overhead for copying. With 8GB RAM you will only be able to deal with maybe 3GB of data effectively.

Hady · March 7, 2018, 12:39pm

Thanks for your support @martin.R. Is there an idea to make chunks of the json to separate it? To load chunk by chunk in the database ?

martin.R · March 7, 2018, 12:47pm

I've no idea about json to be honest. My comment related to the size of the data to which you referred. If you are able to split the json file into chunks that could be another solution.

Hady · March 7, 2018, 1:46pm

Appreciated.Thanks

andrie · March 7, 2018, 2:47pm

The jsonlite package on CRAN supports streaming in of JSON data (if your file is in the appropriate .ndjson format).

See the help for ?stream_in in the jsonlite package.

Because parsing huge JSON strings is difficult and inefficient, JSON streaming is done using lines of minified JSON records, a.k.a. ndjson. This is pretty standard: JSON databases such as dat or MongoDB use the same format to import/export datasets. Note that this means that the total stream combined is not valid JSON itself; only the individual lines are. Also note that because line-breaks are used as separators, prettified JSON is not permitted: the JSON lines must be minified. In this respect, the format is a bit different from fromJSON and toJSON where all lines are part of a single JSON structure with optional line breaks.

You can also evaluate the ndsjon package on CRAN:

Streaming 'JSON' ('ndjson') has one 'JSON' record per-line and many modern 'ndjson' files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R 'data.frame'-like context. Functions are provided that make it possible to read in plain 'ndjson' files or compressed ('gz') 'ndjson' files and either validate the format of the records or create "flat" 'data.table' structures from them.

tbradley · March 7, 2018, 4:05pm

Please see this FAQ about @name mentioning users who are not involved with a thread. In general, it is discouraged to @name someone who has not engaged in the thread themselves.

Hady · March 7, 2018, 4:25pm

Ok noted @tbradley . Will not be repeat again. Just tried to search for members helped me before