As part of a research project, I will have to analyze a dataset of 8 columns and 650 million rows. I can only work on the data for about two and a half months, so I need to make sure I'll be able to run my analyses when I get the data (in about a month's time). The analyses will involve calculating effect size estimates, z/t-statistics and corresponding p-values for different hypotheses. Sorry I cannot be more specific - part of the project is not to know the exact hypotheses in advance.
My question now is what kind of hardware would be good to work on this data. I have worked with large datasets before, but none this large. I have a modern workstation with i7 processor, sufficient disk space and currently 16GB of RAM, running Windows 10 and the latest version of RStudio. I assume RAM will be the bottleneck. Does anyone have a suggestion regarding how much I should have to work with this kind of data?
Best,
Stefan.