Process code on external drive without copying data to local machine

I have a lot of data (Terabytes) on a external drive that i need to process. The current set up is to copy some of the data to a local machine and process it, write the processed info back to the hard drive, then repeat until all data is processed. The copying back and forth is making this an inefficient way to go.

Is it possible to process the data directly on the hard drive without having to copy?

Yes, it should be straightforward, what is the issue you are having? Can you give more information about your workflow?

Thanks. I am interested in how to set this up. Is it as simple as just creating a project on the external hard drive and just running the code from there?

Yes, R works with on-memory data (on most scenarios) so actually it doesn't matter much where the file is located since in order to process it you have to load it in memory.

A faster connection between the computer doing the processing and a hard drive is the simple* way to increase the input/output speed. You could improve the connection or install R on a closer computer. If it's a database server, do the heavy work in the database (which can be safely assumed as very close to the data).

* If only it were simple

If possible, consider compressing data into .zip or .gz files before moving them to the external drive. That way, there's less data to transfer back and forth. R has functions for doing this: zip, unz, unzip, gzcon, and gzfile.

Differences between RAM and storage memory in where they're physically located and what they do:

Both RAM and hard drive memory are referred to as memory, which often causes confusion.

RAM stands for Random Access Memory. Physically, it is a series of chips in your computer. When your computer is turned on, it loads data into RAM . Programs that are currently running, and open files, are stored in RAM ; anything you are using is running in RAM somewhere...

When you save a document it goes on a hard drive, or another type of media storage device. Typically, this type of storage is magnetic, and does not depend on electricity to remember what is written on it. However, it's much slower than RAM .

https://www.lehigh.edu/~inimr/computer-basics-tutorial/ramvsdiskspacehtm.htm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.