Hi all
I'm doing modelling (curve fitting, outlier detection) on a biggish data set that is sitting in Snowflake. Currently I am using DBI::dbGetQuery to obtain subsets of the data which I then save using arrow::feather and analyse using data.table and ggplot.
This is working for a small subset of the data (about 1%) but is already slow to process, even with data.table. Feather and ggplot are acceptably fast but summarising and doing calculations in data.table is slow. Now I need to work with a bigger subset of the data and I wonder whether R is the right tool for the job. Both memory and CPU seem to be constraints.
Anyone out there tried doing this and ended up moving to another analysis platform that can handle big tables (0.5B rows) (and calculation) much faster? Am I actually limited by my PC memory and CPU and need to work to cloud processing somehow (I have no experience with that)?
I'll check out this other thread too:
Thanks!