Trying to create balanced train set with SMOTE


#1

I am trying to create new data to receive a balanced train set for classification with decision tree. When using the SMOTE function, I am always getting the same error:

Error in names(dn) <- dnn : attempt to set an attribute on NULL In
addition: Warning message: In names(data) == as.character(form[[2]]) :
longer object length is not a multiple of shorter object length

I converted everything to factor with as.factor() and deleted the NA's:

train <- na.omit(train)



> str(train)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	11526 obs. of  5 variables:
 $ number: Factor w/ 2 levels "problem",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Land: Factor w/ 29 levels "Australien","Belgien",..: 3 3 3 3 3 3 3 3 9 3 ...
 $ direction: Factor w/ 2 levels "LL","RL": 1 1 1 1 1 1 1 1 2 1 ...
 $ transmission: Factor w/ 2 levels "AUT","SCH": 1 1 1 1 1 1 1 1 1 1 ...
 $ range: Factor w/ 4 levels "1","2","3","4": 3 3 3 2 1 3 2 4 3 2 ...
 - attr(*, "na.action")= 'omit' Named int  6500 9748
  ..- attr(*, "names")= chr  "6500" "9748"

The head of my train set looks like this:

> head(train,10)
     number          Land           direction  transmission       range 
1  reference Bundesrep. Deutschland      LL         AUT             3
2  reference Bundesrep. Deutschland      LL         AUT             3
3  reference Bundesrep. Deutschland      LL         AUT             3
4  reference Bundesrep. Deutschland      LL         AUT             2
5  reference Bundesrep. Deutschland      LL         AUT             1
6  reference Bundesrep. Deutschland      LL         AUT             3
7  reference Bundesrep. Deutschland      LL         AUT             2
8  problem                   Taiwan      LL         AUT             3
9  reference Bundesrep. Deutschland      LL         AUT             4
10 reference        Grossbritannien      RL         AUT             3
11 reference Bundesrep. Deutschland      LL         SCH             2

And this is my code:

smote_train <- SMOTE(train$number ~ ., data  = train, perc.over=500, k =5, learner=NULL)

#2

Hi there, and welcome to community.rstudio.com!

Personally I'm not familiar with SMOTE, but to help you get the right help for your question from someone who knows SMOTE better than me, can you please turn it into a reprex ( repr oducible ex ample)? This will ensure we're all looking at the same data and code. A guide for creating a reprex can be found here.


#3

Try removing the data frame name from the formula:

smote_train <- SMOTE(number ~ ., data  = train, perc.over=500, k =5, learner=NULL)