Error: Can't subset columns that don't exist. x Column `hairpinTrain` doesn't exist.

Hello !
I am trying to create a model using #caret , I splitted the data using the following code:
set.seed(3456)
trainIndex_hairpin <- createDataPartition(data_hairpin$subcellular_location, p = .8,
list = FALSE,
times = 1)
hairpinTrain <- data_hairpin[ trainIndex_hairpin,]
hairpinTest <- data_hairpin[-trainIndex_hairpin,]
After that I started to create the model using the following code:
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated ten times
repeats = 10)
training <- hairpinTrain
testing <- hairpinTest
set.seed(825)
gbmFit2 <- train(hairpinTrain$subcellular_location ~ ., data = hairpinTrain,
method = "gbm",
trControl = fitControl,
## This last option is actually one
## for gbm() that passes through
verbose = FALSE)
gbmFit2
BUT it always give me this ERROR:
Error: Can't subset columns that don't exist.
x Column hairpinTrain doesn't exist.

this is your private data, so your code isn't runnable by us forumites without you taking steps to make a reprex.
That said

is a red flag to me. Unless you are an expert doing something very clever, you should not repeat the name of your dataset anywhere in a function call if you are setting an explicit data param with that datasets name.
i.e. subcellular_location would be assumed to be within the hairpinTrain that you declared you are using.

train(subcellular_location ~ ., data = hairpinTrain,

Thanks for your consideration.
I also tried this and gave me the same error !!

it seems to work fine. Perhaps there is some eccentricity with your data though. you can make a reprex


library(caret)
library(tidyverse)
data_hairpin <- iris %>% rename(subcellular_location=Species) 

trainIndex_hairpin <- createDataPartition(data_hairpin$subcellular_location, p = .8,
                                          list = FALSE,
                                          times = 1)


hairpinTrain <- data_hairpin[ trainIndex_hairpin,]
hairpinTest <- data_hairpin[-trainIndex_hairpin,]

  fitControl <- trainControl(
    method = "repeatedcv",
    number = 10,
    repeats = 10)

  
  gbmFit2 <- train(hairpinTrain$subcellular_location ~ ., data = hairpinTrain,
                   method = "gbm",
                   trControl = fitControl,
                   verbose = FALSE)
#Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  #undefined columns selected
  
#gives some result
gbmFit2 <- train(subcellular_location ~ ., data = hairpinTrain,
                 method = "gbm",
                 trControl = fitControl,
                 verbose = FALSE)
gbmFit2

Thanks very much nirgrahamuk. It works !!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.