First time using SMOTE, error "Error in T[i, ] : subscript out of bounds In addition: There were 20 warnings (use warnings() to see them)"

I am working on a classification problem with some imbalanced data. It was suggested that I try to use SMOTE method to sample up. I found several online references to smote in R but the most popular one seems to be DMwR. I also found a reference to the 'unbalanced' package.

On my real data, I am receiving the message in the title:

Error in T[i, ] : subscript out of bounds
In addition: There were 20 warnings (use warnings() to see them)
warnings()
Warning messages:
1: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf
3: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf...

I tried to create a reprex using diamonds dataset. That failed since I encountered another error. But the setup 'should' be the same in that the dataframe I am passing to smote in both my real data and example data are similar in that the target is imbalanced, a factor and has values 0 or 1. So I wanted to post anyway in case the errors are related or if I've just misunderstood how to use DMwR::smote()

library(tidyverse)

# make a dummy target variable
diamonds$cut %>% table # 'Fair' is the smallest, ise this as an example
my_diamonds <- diamonds %>% mutate(target_var = factor(ifelse(cut == "Fair", 1, 0)))
my_diamonds$target_var %>% table # imbalanced

# Goal: balanced target_var
library(DMwR) # also saw the library 'unbalanced' elsewhere online but looks like DMwR has a larger presense
balanced.diamonds <- SMOTE(target_var ~ carat + color,
                        my_diamonds, perc.over = 100)

If I run that block I get:

Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, :
length of 'dimnames' [2] not equal to array extent

How can I use SMOTE to create new samples of my_diamonds$target_var so that that my_diamonds$target_var %>% table will have an equal number of both labels?

Any tips on my other error method much appreciated too.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.