I came across this website that explains a process of importing very large ndjson files in R, by splitting the records into segments: Importing Large NDJSON Files into R – RLang.io | R Language Programming
In this process, they initially tried to split by 50,000 records, and made en empty file for the segments to be split in:
split -l 50000 data.json ./import/tweets_
I have tried following this process, however I keep getting the error "Error: unexpected numeric constant in "split -l 50000" I have never come across 'split' before nor do i understand what -l is. Can you care to explain?
Additionally, the next line of code given prints the headers:
head -1 import/tweets_da | grep -oP '"([a-zA-Z0-9\-_]+)"\:'
Again, I do not understand what the -1 part is, plus I am sure this returns the same error above when i try this. Additionally, I do not quite understand where the 'import/tweets_da' comes from. If anyone could explain what is going on here it would be very helpful.
I have been trying to find a way for a long time to work with a 7GB ndjson file in R and have so far been unsuccessful. If this process I am pursuing is not any good, I am open to any other suggestions. To give context, I am aiming to do some type of textual analysis on twitter posts .