Hey guys, i am trying to learn a few things in Data Science. I want to attack an imbalanced dataset, fit it with CART f.e. and ran a few Sampling-Techniques over it, to make it better.
My problems now are:
- As a beginner, i reat that i should use a binary, 2-class dataset with 1 y variable and 3 or more x-variables. Where do i find such datasets. I searched the web, but i couldnt come up with good ones.
- Let's say that i found a good data set. The next step would be spiltting it in train and test classes, correct? How do i do that with R?
- Which packages would you suggest to plot the set into a plot with those points, where the majority class is red f.e and the minority blue?
- Which packages should i use to calculate Recall/Precision/Accuary and create the confusion matrix?
- Then i would use CART on my training- set and run the sampling-techniques like ROSE, SMOTE over it, correct? How do i recalculate the recall/Precision/Accuracy?
If the values are higher it did work, correct? - In which context could i use ggplot2 here?
Help is much appreciated!