Boosted regression trees

Hi all,

I am trying to fit a boosted regression trees model using the following settings;

fit.step<- gbm.step(
  data=data.gbm, 
  gbm.x = 2:num.col.data.gbm,
  gbm.y = 1,
  family = "multinomial",
  tree.complexity = 5,
  #n.tree = 100,
  #max.trees = 1000,
  learning.rate = 0.005,
  bag.fraction = 0.5
)

But I am receiving an error message stating Error in total.deviance/n.cases : non-numeric argument to binary operator. Can anyone assist me in this regard?

Hi Edward - can you provide part of your data.frame data.gbm?

A reprex would be great: If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

Also, consider using Cubist; it does rule-based ensembles with linear models in the terminal nodes. Some slides are here.

  head(data.gbm)
   Y altitude frp avg_rain  pop_den so_type dist_road dist_settle dist_river
1  2     1136   3      525 2.247755       1      2305   5344.4789   38439.70
2  2     1131   3      525 2.247755       1      2306   5400.0556   38043.63
3 13     1129   1      525 2.247755       1      1637   3095.3570   37323.73
4 13     1126   1      525 2.247755       1      1637    414.0876   36834.82
5  3     1136   0      525 2.247755       1      1637   4989.1892   40934.57
6 13     1131   2      525 2.247755       1      1637   2032.1340   40266.40

Y is categorical with a range from 2,13, 3 up to 27. The sample size is 2400.

Thanks, but that doesn't really help :grimacing:

Can you simulate (or blind) a small set that has the same issue?

Perhaps can I send the .csv file? For now I will try to fit the model on a small data set and see what happens.

As @max mentioned earlier you should use a reprex. If you can prune your data down to a small set then you can build a data.frame with the data in a reprex along with an example that shows the issues you are running into.

A prose description isn't sufficient, you also need to make a simple reprex that:

  1. Builds the input data you are using.
  2. The function you are trying to write, even if it doesn't work.
  3. Usage of the function you are trying to write, even if it doesn't work.
  4. Builds the output data you want the function to produce.

You can learn more about reprex's here:

Right now the is an issue with the version of reprex that is in CRAN so you should download it directly from github.

Until CRAN catches up with the latest version install reprex with

devtools::install_github("tidyverse/reprex")

The reason we ask for a reprex is that it is the easiest and quickest way for someone to understand the issue you are running into and answer it.

Nearly everyone here who is answering questions is doing it on their own time and really appreciate anything you can do to minimize that time.

For example here is a reprex of the sample data you provided earlier.

suppressPackageStartupMessages(library(tidyverse))
tbl <- tribble(
~Y, ~altitude, ~frp, ~avg_rain, ~pop_den, ~so_type, ~dist_road, ~dist_settle, ~dist_river,
 2, 1136, 3, 525, 2.247755, 1, 2305, 5344.4789, 38439.70, 
 2, 1131, 3, 525, 2.247755, 1, 2306, 5400.0556, 38043.63,
 13, 1129, 1, 525, 2.247755, 1, 1637, 3095.3570, 37323.73,
 13, 1126, 1, 525, 2.247755, 1, 1637, 414.0876, 36834.82, 
 3, 1136, 0, 525, 2.247755, 1, 1637, 4989.1892, 40934.57, 
 13, 1131, 2, 525, 2.247755, 1, 1637, 2032.1340, 40266.40
)

tbl
#> # A tibble: 6 x 9
#>       Y altitude   frp avg_rain pop_den so_type dist_road dist_settle
#>   <dbl>    <dbl> <dbl>    <dbl>   <dbl>   <dbl>     <dbl>       <dbl>
#> 1    2.    1136.    3.     525.    2.25      1.     2305.       5344.
#> 2    2.    1131.    3.     525.    2.25      1.     2306.       5400.
#> 3   13.    1129.    1.     525.    2.25      1.     1637.       3095.
#> 4   13.    1126.    1.     525.    2.25      1.     1637.        414.
#> 5    3.    1136.    0.     525.    2.25      1.     1637.       4989.
#> 6   13.    1131.    2.     525.    2.25      1.     1637.       2032.
#> # ... with 1 more variable: dist_river <dbl>

Created on 2018-03-11 by the reprex package (v0.2.0).

Dear danr,

I am really grateful that you have taken time out of your busy schedule to attempt on assisting me with this problem. I downloaded the reprex package from https://github.com/tidyverse/reprex but it is failing to install in R studio. I tried to install it manually by going to Tools<Install packages<browse...Yet it still failed again.

How do I install with devtools::install_github(“tidyverse/reprex”)? Or maybe I could send you the data and code vial email?

Dear Max,

I downloaded the reprex package from https://github.com/tidyverse/reprex but it is failing to install in R studio. I tried to install it manually by going to Tools<Install packages<browse...Yet it still failed again.

I also used the command install.packages("reprex") it still failed again. How can Install the reprex package so that I can effectively share my data with you?

I fitted the model on a sample size of 700 (down from 2400) but I still got the same error message.

What happens when you enter:

devtools::install_github("tidyverse/reprex")

on the command line or in the console of R Studio?

I get the message: devtools::install_github(“tidyverse/reprex”)
Error: unexpected input in "devtools::install_github(“"
Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) && :
missing value where TRUE/FALSE needed

I'm not sure why that didn't install reprex from github. We'll have to wait for someone else to join the thread who can diagnose why it failed.

Don't know if this relates to the issue you are running into but the doc's for gbm.step do not list "multinomial" as a supported family. Here is a reprex showing an error when running gbp.step with family = "multinomial"

gbm.step docs at:

https://www.rdocumentation.org/packages/dismo/versions/1.1-4/topics/gbm.step

family error example:

suppressPackageStartupMessages(library(gbm))
suppressPackageStartupMessages(library(dismo))

data.gbm <- tibble::tribble(~Y, ~altitude, ~frp, ~avg_rain, ~pop_den, ~so_type, ~dist_road, ~dist_settle, ~dist_river,
                                                        2, 1136, 3, 525, 2.247755, 1, 2305, 5344.4789, 38439.70, 
                                                        2, 1131, 3, 525, 2.247755, 1, 2306, 5400.0556, 38043.63,
                                                        13, 1129, 1, 525, 2.247755, 1, 1637, 3095.3570, 37323.73,
                                                        13, 1126, 1, 525, 2.247755, 1, 1637, 414.0876, 36834.82, 
                                                        3, 1136, 0, 525, 2.247755, 1, 1637, 4989.1892, 40934.57, 
                                                        13, 1131, 2, 525, 2.247755, 1, 1637, 2032.1340, 40266.40
)

fit.step<- gbm.step(
    data=data.gbm,
    gbm.x = 2:9,
    gbm.y = 1,
    family = "multinomial",
    tree.complexity = 5,
    #n.tree = 100,
    #max.trees = 1000,
    learning.rate = 0.005,
    bag.fraction = 0.5
)
#> 
#>  
#>  GBM STEP - version 2.9 
#>  
#> Performing cross-validation optimisation of a boosted regression tree model 
#> for Y and using a family of multinomial 
#> Using 6 observations and 8 predictors
#> Error in calc.deviance(y_i, u_i, weights = site.weights, family = family, : unknown family, should be one of: "binomial", "bernoulli", "poisson", "laplace", "gaussian"

Created on 2018-03-11 by the reprex package (v0.2.0).

It probably could be the reason. Therefore, this means that there could be no family distribution for a response variable of more than 2 categorical values for the gbm.step function. Which is a bit surprising. Nevertheless, I really appreciate your assistance.

Regards

The multinomial functionality is relatively recent. The package got a new maintainer in the last few years but seems to be defunct again AFAICT. xgboost and C50 do a much better job for me.

It's because there are curly quotes in there! :slightly_smiling_face:

@Edward, try:
devtools::install_github("tidyverse/reprex")

Its not working either. Error message: devtools::install_github("tidyverse/reprex")
Error in loadNamespace(name) : there is no package called ‘devtools’
Error in if (file.exists(dest) && file.mtime(dest) > file.mtime(lib) && :
missing value where TRUE/FALSE needed

Do you have codes to fit a model using xgboost?

You will need to install devtools. It's a standard CRAN package so

install.packages("devtools")

will get it installed for you.

1 Like