I am working on a classification problem with some imbalanced data. It was suggested that I try to use SMOTE method to sample up. I found several online references to smote in R but the most popular one seems to be DMwR. I also found a reference to the 'unbalanced' package.
On my real data, I am receiving the message in the title:
Error in T[i, ] : subscript out of bounds In addition: There were 20 warnings (use warnings() to see them) warnings() Warning messages: 1: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf 2: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf 3: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf...
I tried to create a reprex using diamonds dataset. That failed since I encountered another error. But the setup 'should' be the same in that the dataframe I am passing to smote in both my real data and example data are similar in that the target is imbalanced, a factor and has values 0 or 1. So I wanted to post anyway in case the errors are related or if I've just misunderstood how to use DMwR::smote()
library(tidyverse) # make a dummy target variable diamonds$cut %>% table # 'Fair' is the smallest, ise this as an example my_diamonds <- diamonds %>% mutate(target_var = factor(ifelse(cut == "Fair", 1, 0))) my_diamonds$target_var %>% table # imbalanced # Goal: balanced target_var library(DMwR) # also saw the library 'unbalanced' elsewhere online but looks like DMwR has a larger presense balanced.diamonds <- SMOTE(target_var ~ carat + color, my_diamonds, perc.over = 100)
If I run that block I get:
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, :
length of 'dimnames'  not equal to array extent
How can I use SMOTE to create new samples of
my_diamonds$target_var so that that
my_diamonds$target_var %>% table will have an equal number of both labels?
Any tips on my other error method much appreciated too.