I have over 6mil rows and around 30col in my dataset. I wrote a code for a small subsample of the data that:
- uses split() to put every unique combination defined in the argument into a list item
- iterate through that list using lapply() to do manipulation.
Problem is that the split() part explodes to 90gb of RAM and crashes my server.
What else can I use instead of the split()? Do I need to move from lapply as well? I actually use the parLapply() makes things much faster.