Model training error

iFeanyi · March 24, 2022, 7:29am

Hello, I tried to train a random forest model with a training dataset of 200k+ rows, but got the following message in my console Error: cannot allocate vector of size 2.3GB.

Have you experienced this before? If yes, how did you solve the problem? I know it has to do with memory, but is there a package you would recommend that can help me solve this problem? Thanks as always.

andresrcs · March 24, 2022, 12:33pm

Machine learning on big datasets is memory intensive, I don't think you are going to get meaningful memory allocation reductions regardless of what package you use.

I found this Python package for random forest that is supposed to partition the process so it can fit in less memory but it is just an implementation for a paper so it is not well documented, tested or even maintained.

Most ML tools for practical applications assume large computational resources available because it's what makes sense for real world scenarios.

I would say you should test things with a smaller subset (sample) of your data and use Cloud Computing for training the final model if needed or worthy.

iFeanyi · March 24, 2022, 7:40pm

Thanks. Somebody suggested using memory.limit() to increase memory size. I don't know if that could help, otherwise I will consider using a cloud computing option to train my model.

andresrcs · March 24, 2022, 8:12pm

As far as I know memory.limit() defaults to the total available memory on the system, so unless you have manually set your limit lower, it is not going to make any difference, a quick way to make sure of it is to simply run memory.limit() and see if the output matches the physically installed memory on the system.

system · April 14, 2022, 8:12pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.