Reading Multiple Files in R

Hello there,

I have 5,3GB of data, 35.360 zip files with 1 csv file inside each of them, all organized inside 41 folders, those are log files. The file names are organized like this:

Folder 2018-10-25:



2018-10-25-00-00-32fa.csv.gz and so on;

Folder 2018-10-26:



2018-10-26-00-00-32fa.csv.gz and so on.

Last folder 2018-12-04.

How can I read all those files in R as just one file? Any tips to work with such a great amount of data?

Kind Regards,


5.3GB is a lot of data; you might have problems fitting it into memory. Do you have access to a database? Databases play nicely with dplyr (via DBI / dbplyr) and can do a lot of heavy lifting for you.

My approach would be along these lines:

zipfiles <- list.files(pattern= '*.gz') # data frame of zip files in current directory

for (i in seq_along(zipfiles)) {

   unzip(zipfiles[i], files = 'name_of_yer_file.csv', exdir = tempdir(), junkpaths = T)
  # your csv file will be unzipped to tempdir

 # somehow insert the content of the csv file to your database 
 # this will depend on its structure and your database of choice


Also, if you were willing to risk the Purity of Essence of your code consider this script.

It is written in the language of the snake people, and easily integrated with R code. I have used it with great success when parsing S3 logs; I am certain it can be used for other log structures with only minor hacking.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.