RStudio Crashing when Reading Large .CSV Files

Hello RStudio Community,

I am working with some very large data coming from ArcGIS as .csv files. The largest dataset is approx. 60 columns and 70 million rows. But even with somewhat smaller files (60 cols and 30 million rows), RStudio crashes with no warning while trying to read it in. Is there a way to fix this?

Perhaps there is a less computationally-demanding method to import originally-shapefile, turned .csv files to R? The smallest .csv dataset that I have is 60 cols and 11 million rows and it imports successfully in 10 minutes, then I save it as RDS, and importing it from RDS is much faster (<1min). Can I convert my large files to an easier file type beforehand?

Thank you in advance for any advice.

Are you sure this is RStudio related? Does the same files import successfully on the R GUI? I think this is related to your system running out of RAM memory instead of problems with the IDE.

andresrcs,

Thank you so much for your response. No, I am not sure that this is RStudio related. I will have to try importing on the R GUI. Here are my outputs for:
tempdir() # My very large J harddrive, good
memory.limit() # 32665
memory.limit(size = 1000000000000) # 1e+12
memory.size() #112.35
Although I increase the size as instructed by some other posts, the final number listed above is concerning. Do you know of any solutions to increase my RAM memory? Thanks!

The RAM available for the R process is ultimately (and by default) limited to the total amount of RAM memory physically installed in your system. Using SWAP memory (from your hard drive) is not practical on this situation because it would make things excruciatingly slow. So the only practical way of increasing your RAM memory is to buy more and install it in your system.

Although, it might be other options to explore. If this is a one-time task, a better option would be to rent a cloud computing server with more resources (like an Amazon Web Services EC2 instance) and depending on what exactly you are trying to do with the data, you can use "on-disk" approaches for geocomputation like Postgresql + PostGIS (relational database).

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.