I am working on a very unbilanced dataset (90% to 10%) with around 350.000 records, and am trying various classification methods. I bagan with SMOTE, which was quite fast, improved performance on tree classifiers (CART) but made it worse with all other (Bayes, SVM). Also, it made classifiers much more long to compute (for instance, I could not run Random Forest algorithm due to the long time it took). Due to this, I have thought it would be better to undersample. I tried both CNN and ENN, but those have been running for three days now on two different, quite well performing machines. Any suggestion?
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.