Help on imbalanced datasets

Hey guys, i am trying to learn a few things in Data Science. I want to attack an imbalanced dataset, fit it with CART f.e. and ran a few Sampling-Techniques over it, to make it better.
My problems now are:

  1. As a beginner, i reat that i should use a binary, 2-class dataset with 1 y variable and 3 or more x-variables. Where do i find such datasets. I searched the web, but i couldnt come up with good ones.
  2. Let's say that i found a good data set. The next step would be spiltting it in train and test classes, correct? How do i do that with R?
  3. Which packages would you suggest to plot the set into a plot with those points, where the majority class is red f.e and the minority blue?
  4. Which packages should i use to calculate Recall/Precision/Accuary and create the confusion matrix?
  5. Then i would use CART on my training- set and run the sampling-techniques like ROSE, SMOTE over it, correct? How do i recalculate the recall/Precision/Accuracy?
    If the values are higher it did work, correct?
  6. In which context could i use ggplot2 here?

Help is much appreciated!

I've parked a 10Kx20, approximately, dataset. To make it grouped, just cbind an equal number of rows from a seq, such as

classes <- sample(c(1,2),10000,replace = TRUE)

See the caret package to do the split.

ggplot2 can do the plotting easily for x,y pairs with `fill = classes"

caret::confusionMatrix

How to train a model is simpler than deciding what model to train. Come back with a more directed question for #5?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.