What is the limitation of caret package?

R.Onur · April 5, 2018, 11:33pm

Hi everyone,

I am wondering about that, how many data could I use for classification in caret package? I mean, what is the upper data limitation for classification methods? I am able to use only svm methods (with non-interface code) whereas I can not use another methods for instance C5, J48, pam, gpls, lda etc.

Best regards.

FYI:

My data dimensions are 211x242323 ，

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1 yaml_2.1.18

and

My system is 12gb ram, i7-4500u 2.00GHz (4CPUs),~2,6 GHz, ssd sata hdd 250 gb.

eoppe1022 · April 6, 2018, 1:53am

I don't have an actual answer here, but this seems like a good case to use some sort of feature selection (PCA, an autoencoder with keras, etc.). 240,000 variables for 211 observations is a LOT

Max · April 10, 2018, 12:34am

It depends somewhat on the nature of the data (are they continuous? factors? etc). That said, that variables to samples ratio is pretty pathological and would probably benefit from an initial variable filter for high correlations and near-zero variance predictors. These can be done using train's preProc argument or using a recipe.

R.Onur · April 10, 2018, 12:56am

Dear eoppe1022 and Max,

Thank you very much both of you. Dear Max, I would like to mention that I have tried preProc methods. My data type is numeric(SNP data). I am able to use svm method (with different kernel types) and some glmnet methods as well.

Best wishes.