Please, what is the best way to handle the class imbalance of a large dataset? I have a dataset of over 300k rows, whose target variable has imbalanced classes. I have tried using ROSE to balance out the training dataset, after an 80/20 split, but it keeps returning an empty table of classes. This is my code:
library(ROSE)
library(DMwR)
library(caret)
ind <- createDataPartition(heart_df$HeartDisease,p = 0.8,list = F)
train_heart <- heart_df[ind,]
test_heart <- heart_df[-ind,]
nrow(train_heart)
nrow(test_heart)
set.seed(111)
trainUp <- ROSE(HeartDisease ~.,data = train_heart)$heart_df
table(trainUp$HeartDisease)
Here is a screenshot of the data:
There are more "No" than "Yes", and so I want to balance out the training data. But the table(trainUp$HeartDisease)
code returns the following output in my console: < table of extent 0 >
instead of the adjusted classes. Please, I will appreciate your help, thank you.