Please, what is the best way to handle the class imbalance of a large dataset? I have a dataset of over 300k rows, whose target variable has imbalanced classes. I have tried using ROSE to balance out the training dataset, after an 80/20 split, but it keeps returning an empty table of classes. This is my code:
There are more "No" than "Yes", and so I want to balance out the training data. But the table(trainUp$HeartDisease) code returns the following output in my console: < table of extent 0 > instead of the adjusted classes. Please, I will appreciate your help, thank you.
Hello, this is not quite a reprex, as it seems to rely both on unshared data (heart_df) and functions not declared by the listed library calls (createDataPartition). Could you review these elements ?
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...
This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code