Family distribution for categorical data for gbm package


#1

Hi all,

I am new to R but I would like to fit a Generalized Boosted Regression Modeling (gbm) package to my data. The challenge is that my response data is categorical (I want to model land cover change). What family distribution should I use for my categorical data as the available options such as “bernoulli” , “poison” seem not to be applicable. Can anyone help?

Regards


#2

If it is binary, then use “bernoulli”. If there are more than two categories, use “multinomial”. Keep in mind that, for that latter, I’ve had gbm hang indefinitely for some data sets. Alternatively, use C50 or xgboost which are IMO better.


#3

Also, be aware that there is a “machine learning and modeling” group that might get you more responses.


#4

Thank you for your assistance.

I have fitted the following model using the “multinomial” distribution:

> fit.step<- gbm.step(
+   data=data.gbm, 
+   gbm.x = 2:num.col.data.gbm,
+   gbm.y = 1,
+   family = "multinomial",
+   tree.complexity = 5,
+   #n.tree = 100,
+   #max.trees = 1000,
+   learning.rate = 0.005,
+   bag.fraction = 0.5
+ )

But I keep on getting this error message: “Error in total.deviance/n.cases: non-numeric argument to binary operator”

What could be the problem?


#5

I posted this question a few days ago on the “machine learning and modeling” group but I have not yet received any response.