Case weights in Ranger/Caret

I am using the ranger package in caret to develop a random forest model to predict the risk of dying.

I am more interested in the model doing well at predicting those who end up dying, rather than being good at predicting those who live.

Therefore, I am trying to add a case.weights statement to my model, but I am dumbfounded as to how to to implement it, as I am very new to R.

So far, my code looks like this:

set.seed(40)

control.data <- trainControl(method= "cv",  numer=5, sampling ="up", verboseIter = TRUE, classProbs = TRUE)

rfGrid <- expand.grid(
.mtry = 2:6, 
.splitrule = "gini",
.min.node.size = c(250,500))

fit.dataup <- train(mort_30 ~ C_SEX+V_AGE+Hemoglobin,Thrombocytes+Leukocytes+CRP,
                                      data = data.train,
                                      method = "ranger",
                                      max.depth = 5,
                                      num.trees= 1000,
                                      trControl = control.data,
                                      tuneGrid = rfGrid,
                                      importance = "impurity",
                                      verbose = TRUE)

I have tried using both 'case.weights' and 'weights' in my train(), but no matter how I write it up, I cant get it to work.
Which syntax do I have to use? Let's say I want the "dead" cases to be weighted 2:1 to my "alive cases".

Thank you so much in advance!

The formula method for train() has an argument called weights. I believe that you give that argument a vector.

library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

set.seed(1)
wts <- runif(150)
set.seed(2)
mod <- train(Species ~ ., data = iris, weights = wts, method = "ranger")

Created on 2021-11-02 by the reprex package (v2.0.0)

1 Like

Thank you so much for your reply, but I still cant get it to work, even if I copy directly from your code.
I keep getting an error saying "variable lengths differ (found for '(weights)')".
I also found an old post from github from ´16(https://github.com/topepo/caret/issues/414), where i believe the solution you suggested would be

mod <- train(Species ~ ., method = "ranger", data = iris, weights = (1:100)/100)

But I get the same error message. Do you know how to fix the error?

Update:

I figured out the problem, but I am still searching for a solution. The reason I am using caret is because I want to be able to use upsample my minority class during crossvalidation and not before, to avoid an overoptimistic model.
I figured out how to use the weights statement, like so:

model_weights <- ifelse(data.train$mort_30 == "Dead",  20, 1)

fit.data <- train(mort_30 ~ C_SEX+V_AGE+Hemoglobin,Thrombocytes+Leukocytes+CRP,
                                      data = data.train,
                                      method = "ranger",
                                      max.depth = 5,
                                      num.trees= 1000,
                                      trControl = control.data,
                                      tuneGrid = rfGrid,
                                      weights = model_weights,
                                      importance = "impurity",
                                      verbose = TRUE)

but it doesn't seem to work, when I also use upsampling during CV, because then the length of my model_weights and the number of samples are no longer the same. (i get the same error message as before).

Does anyone know wether or not there is a workaround on this, so the two (upsampling + weights) can work together?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.