[R] pls package - validation

Wed Feb 8 06:56:19 CET 2017

I think this wants a statistical discussion, which is OT here.
stats.stackexchange.com would be a better place to post for that.

However, if I understand correctly, using pls or anything else to try
to fit (some combination of) 501 variables to 16 data points -- and
then crossvalidate with 6 data points -- is utter nonsense. You just
have a fancy random number generator!

As I said, I think it better to follow up or complain about me on
stackexchange rather than here.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Feb 7, 2017 at 4:49 PM, Ladislav Rozkošný
<ladarozkosny at seznam.cz> wrote:
>
>
>
> Hi,
>
>
>
>
> I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16
> train, 6 test). I know that basic for considering of number of component is
> cross-validation (in my case 'LOO') and then I should choose number of
> component with minimum of RMSEP (or first minimum). But problem is that
> values of RMSEP is increasing (not the opposite). Should I choose only 1
> component?
>
>
>
>
> And then I tried compute R2 with my test-dataset (6 samples) and I received
> nonsensical values (below 0, bigger then 1).
>
> Do you have any idea what may be caused? If it's my problem with fitting or
> problem with datasets.
>
>
>
>
> Below, you can see my results:
>
>
>
>
>>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO")
>
>>summary(pH.spec)
>
> Data:     X dimension: 16 501
>     Y dimension: 16 1
> Fit method: kernelpls
> Number of components considered: 14
>
> VALIDATION: RMSEP
> Cross-validated using 16 leave-one-out segments.
>        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7
> comps  8 comps  9 comps  10 comps  11 comps
> CV          0.5343   0.5435   0.5506    1.629    1.617    1.742    1.921
> 1.979    1.977    1.971     1.972     1.972
> adjCV       0.5343   0.5419   0.5486    1.587    1.570    1.688    1.860
> 1.916    1.914    1.908     1.910     1.909
>        12 comps  13 comps  14 comps
> CV        1.972     1.972     1.972
> adjCV     1.909     1.909     1.909
>
> TRAINING: % variance explained
>     1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
> 9 comps  10 comps  11 comps  12 comps
> X    96.410   99.655    99.87    99.90    99.93    99.94    99.95    99.96
>   99.96     99.97     99.98     99.99
> pH    3.649    8.342    19.41    67.48    88.96    97.19    99.69    99.94
>   99.99    100.00    100.00    100.00
>     13 comps  14 comps
> X      99.99       100
> pH    100.00       100
>
>
>
>
>> R2(pH.spec, newdata = soil.test)
> (Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
>       6 comps      7 comps      8 comps
>    -1.65763     -0.60849     -0.05253     -0.72870     -2.84718     -2.34102
>      -3.28201     -3.68611     -3.69817
>     9 comps     10 comps     11 comps     12 comps     13 comps     14 comps
>
>    -3.77271     -3.74585     -3.76342     -3.76074     -3.76110     -3.76115
>
>
>
>
>
>
> Thank you in advance for your help
>
>
>
>
>
>
> =
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.