Should adaboost be this slow

toby · January 1, 2021, 11:09am

I am starting out and working my way through machine learning algorithms. I am trying adaboost but finding that the processing time seems to be endless i.e. it is still running after 10-15 minutes. The code is below. Should it take this long? It is a small dataset (around 350 samples with 7 dependent variables).

library(adabag)
library(caret)
set.seed(444)
train_sample<-sample(302,210)
BF<-Analysis29_12_20
BF$inflammatory<-as.factor(BF$inflammatory)
BF_train<-BF[train_sample,]
BF_test<-BF[-train_sample,]
model<-boosting(inflammatory~.,data=BF_train,boos=TRUE)

If I try it with cross-validation I get the following error:

model<-boosting.cv(inflammatory~.,data=BF_train,boos=TRUE,mfinal=10,v=5)
Error in boosting.cv(inflammatory ~ ., data = BF_train, boos = TRUE, mfinal = 10, :
v should be in [2, n]

Can anyone help? Thanks!

nirgrahamuk · January 1, 2021, 12:05pm

I think the package isnt built for training speed performance. compared with fastAdaboost its 100x slower.


set.seed(42)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) )
fakedata$Y <- factor(fakedata$Y)

library(adabag)

library(microbenchmark)
microbenchmark(
  adab <- boosting(Y~., data=fakedata, boos=TRUE),
  times = 10L
)
#20 seconds
adab



library(fastAdaboost)

microbenchmark(
  fadab <- adaboost(Y~., data=fakedata, 100),
  times = 10L
)
#0.2 seconds
fadab

also I don't understand your cv issue, the only way I can trigger your error is to set v higher than the number of observations in the input data.


model_2<-boosting.cv(Y~.,data=fakedata,boos=TRUE,mfinal=10,v=3)
# if v is not less than the number of observations....
model_201<-boosting.cv(Y~.,data=fakedata,boos=TRUE,mfinal=10,v=201)

toby · January 2, 2021, 9:20pm

Thanks- really helpful. I found that this solved all the issues
data<-as.data.frame(data)

system · January 23, 2021, 9:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.