Hello, I tried to train a random forest model with a training dataset of 200k+ rows, but got the following message in my console Error: cannot allocate vector of size 2.3GB.
Have you experienced this before? If yes, how did you solve the problem? I know it has to do with memory, but is there a package you would recommend that can help me solve this problem? Thanks as always.
Machine learning on big datasets is memory intensive, I don't think you are going to get meaningful memory allocation reductions regardless of what package you use.
I found this Python package for random forest that is supposed to partition the process so it can fit in less memory but it is just an implementation for a paper so it is not well documented, tested or even maintained.
Most ML tools for practical applications assume large computational resources available because it's what makes sense for real world scenarios.
I would say you should test things with a smaller subset (sample) of your data and use Cloud Computing for training the final model if needed or worthy.
Thanks. Somebody suggested using memory.limit() to increase memory size. I don't know if that could help, otherwise I will consider using a cloud computing option to train my model.
As far as I know memory.limit() defaults to the total available memory on the system, so unless you have manually set your limit lower, it is not going to make any difference, a quick way to make sure of it is to simply run memory.limit() and see if the output matches the physically installed memory on the system.