64 GB RAM Laptop

I just purchased a new laptop with 64 GB of RAM. I understand that I need to have enough RAM to hold the largest chunk of data. However, if I have 64 GB of RAM, can I immediately load a large data set, for example, a 30 GB data set into memory? Or are there other changes/options I will need to make to the IDE settings so that I can read in such a data set to memory?
I recognize that this is not the best way to handle data - would like to know the answer regardless.
Thanks

1 Like

Hi, and welcome!

Congratulations! That's definitely heavy-duty.

Some things to keep in mind about RAM and R.

  1. RAM has other demands than just tending to the needs of any one program. If you look at your system's usage "at rest" with no applications loaded, you'll see that free RAM is somewhat lower than the total.

  2. R runs in memory, including not only data but all of the associated libraries that it loads for a particular program. One way to minimize this is to refrain from loading packages like tidyverse in favor of the specific packages you'll be using. Even those can be trimmed further by considering which libraries you are calling only for a few specific functions. For example at the beginning of a session, you might be loading csv files with readr, and you can do this with

readr::read_csv("somefile.csv")

Occasionally you will get function is not exported, in which case this often works

readr:::read_csv("somefile.csv")

(triple ':')

You can also detach libraries once no longer needed and rm objects no longer needed.

For objects created that will be used later,

save(my_object, file = "my_object.Rds")

which frees memory for another process and then

load("my_object.Rds")

when you need it back again.

  1. R is good at garbage collection, despite what I once thought and what you may have heard. However, the operating system may not be.

  2. The unices, including MAC OS, have a limit on how much RAM can be allocated to a process, and you'll sometimes run up against that. There's a ulimit tuning parameter in the terminal that you can adjust to increase it. I'm sorry that I can't tell you about Windows--I gave up on it when I got tired of waiting for Vista.

  3. Hadley Wickham's Advanced R has a very helpful chapter on profiling and memory management.

  4. R supports calls to compiled programs that may be more memory efficient. Of course, they use memory, too, but they don't necessarily need to be running on the local system, but dispatched for execution in the network or cloud.

  5. See the CRAN Task View on High Performance Computing for tools including the ff package that

... provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory

Good luck with your work!

2 Likes

Wow - thanks for such a wonderful response. (That is a great welcome to the R Community following my first post.)

I'll spend some time synthesizing the info. Thanks again.

1 Like

I wouldn't worry about tiny issues like the RAM used by loading libraries.

The main issue you will have is when you start manipulating the data you have loaded into RAM. Many operations require the data to be copied in memory, sometimes many times. Hence, depending on what you do, you may require e.g. 3x the RAM of your actual data (this really is just an example and will depend on your code).

If you are tight on RAM, then either consider using a database to extract data when required or the data.table package is very memory-efficient, as it modifies data by reference without copying:

3 Likes

Thanks for the info. I am thinking data.table will be a good place to start, although at first glance, it seems a lot like base R - I guess the difference is that it has characteristics that are optimized for larger data tables. Will spend some time using it regardless.

1 Like

One way of thinking of data.table is that it's base R on steroids. With turbo boosters. And clearer syntax (cannot think of an analogy for that one).

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.