[R] PLS component selection for GPLS question
Torsten Schindler
Torsten.Schindler at chello.at
Fri Jul 29 13:21:05 CEST 2005
How to select the number of PLS components for GPLS for data sets
with few samples?
Concrete problem:
My data set: 9 samples of class A and 37 of class B with 254
descriptors.
In the paper: "Classification Using Generalized Partial Least
Squares", Beiying Ding, Robert Gentleman, Bioconductor
Project Working Papers, year 2004, paper 5
Section 2.6 Assessing Prediction:
Cite: "The optimal number of PLS components is selected by choosing
that value of K which minimizes LOOCV
error rate for the training set."
and in section 3.1.3 Colon data, subsection: Random splitting
Cite: "Due to the instability of LOOCV error rates for data with few
samples and many covariates, comparison of various
classifiers based solely on LOOCV classification errors may not be
reliable."
the authors use random splitting to determine the number of PLS
components in GPLS, but I'm still not sure how to
choose the right number of PLS components for my data set.
I used the function errorest() from package ipred to estimate the
error rates und gpls() with Firth procedure switched on.
The attached PDF Graphik illustrates the problem for my data set.
S_n is the model sensitivity and S_p the model specifity.
With 4 component I get the best crossvalidation error rate 17% and
with 5 components the best bootstrap error rate 9%, but
the sensitivity of the model is only 11% !
If one choose 13 components, one gets 100% sensitivity and 100%
specifity and CV error is 34% and the boostrap error is 40%
and the risk that the model is overtrained is higher.
How much components should I choose now to get the best GPLS model?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GPLS_component_selection.pdf
Type: application/pdf
Size: 11328 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20050729/c2f5fb54/GPLS_component_selection.pdf
-------------- next part --------------
More information about the R-help
mailing list