I am working with big tables. I figured out I could use Arrow to keep my RAM relatively low and do all the calculation outside the memory.
I spent quite a bit of time reading through Arrow for R and watched a few youtube videos. However im still struggling to understand what is happening....
- open .parquet file using arrow::open_dataset()
- write dplyr-like code to join 4 tables
- use %>% compute() at the end of dplyr code
- RAM jumps from 100mb to 60GB
If I use dplyr to do the same thing (but not using Arrow), RAM jumps to 50GB in use.
Im refering to this RAM usage report in RStudio
Am I missing a step to lower RAM usage while using Arrow?