What is the limitation of caret package?


#1

Hi everyone,

I am wondering about that, how many data could I use for classification in caret package? I mean, what is the upper data limitation for classification methods? I am able to use only svm methods (with non-interface code) whereas I can not use another methods for instance C5, J48, pam, gpls, lda etc.

Best regards.

FYI:

My data dimensions are 211x242323 ,

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 yaml_2.1.18

and

My system is 12gb ram, i7-4500u 2.00GHz (4CPUs),~2,6 GHz, ssd sata hdd 250 gb.


#2

I don't have an actual answer here, but this seems like a good case to use some sort of feature selection (PCA, an autoencoder with keras, etc.). 240,000 variables for 211 observations is a LOT


#3

It depends somewhat on the nature of the data (are they continuous? factors? etc). That said, that variables to samples ratio is pretty pathological and would probably benefit from an initial variable filter for high correlations and near-zero variance predictors. These can be done using train's preProc argument or using a recipe.


#4

Dear eoppe1022 and Max,

Thank you very much both of you. Dear Max, I would like to mention that I have tried preProc methods. My data type is numeric(SNP data). I am able to use svm method (with different kernel types) and some glmnet methods as well.

Best wishes.