Compres data to export it

Hi community! I have a quite big data.frame in R that I would like to export in order to manage it in excel.

The data.frame is about 884600 rows and 45 columns. The problem I have when exporting it in .cvs or .xlsx format is that it takes such a long time that in almost 10 hours I haven't seen any result (r keeps working on it); Then, the command: write.xlsx or write.csv is not working for me. Do anyone know a way of exporting big data.frames in a "short time" from R? (Computer process capability I think is not a problem)

Thanks in advance!

you can try fwrite, which works similar to write.csv, but faster.

fwrite(myData, "myData.csv", row.names=T)

In general I think it is better to go with csv or txt format when handling big data tables, xlsx might take up much more space as a file.
Also, when you handle big data frames in R and every operation takes up a lot of time you might wanna consider converting it into a data.table, which makes everything that you do in R with it much faster, including the read-in process with fread().

1 Like

Hello,

I like to add that if R can hardly handle writing the file, Excel will hang trying to load it and manipulations will be nearly impossible on such a large set.

Why do you want to do anything in excel with it? R is much better at handling data manipulations of such a scale...

If you want to save data without the need for excel, I suggest you use the RDS format to save data in a compressed state. Excel can't read it, but R can and it saves a lot of disk space. See example here:

#Generate large dataset
n = 5000000
myData = data.frame(x = sample(LETTERS, n, replace = T),
                    y = sample(1:1000, n, replace = T),
                    z = runif(n))

#Save as csv
#-----------
write.csv(myData, "testFile.csv")# About 175mb


#Save as rds
#-----------
saveRDS(myData, "testFile.rds") #About 40mb


#Read rds
#-----------
loadData = readRDS("testFile.rds")

Hope this helps,
PJ

R shouldn't take 10 hours to write a CSV file that size. At most, it should take a few minutes.

Are you sharing the data over a network? If so, it might be a good idea to write a compressed (.zip or .tar) file, upload that, and have the other person download and decompress it on their computer:

csv_path <- file.path(tempdir(), "mydata.csv")
zip_path <- file.path(tempdir(), "mydata.zip")
write.csv(mydata, csv_path)
zip(zipfile = zip_path, files = csv_path)
file.copy(zip_path, "/final/sharing/directory")
# Now others can get the file and do as they please

Sadly, not all co-workers have accepted R or any other imperative statistical programming language into their hearts.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.