It's a general question.
I have to process Data size greater than memory. ~30-80 GBs. Mostly, data fails to read or system crashes.
Processing involves (Data cleaning, manipulation and/or visualization).
Work involved: -
Data Cleaning = replacing, removing and editing strings.
Manipulation = creating variables from strings and then passed to regression model. Or some basic function like count of words.
Visualization = Plotting multiple variables in leaflet, or generating bar or pie charts.
I generally use aws for these problems,
but out of curiosity , if someone faces data size problem, what method(s) one would implement to tackle this problem, in low end machine.(Mine is i5, 8gb ram, typical HDD).
Note: Speed is a constraint, should not take days to get solution. And, obviously system should not crash.
Thanks for your time and opinion.