[R] pls package - validation
Bert Gunter
bgunter.4567 at gmail.com
Wed Feb 8 06:56:19 CET 2017
I think this wants a statistical discussion, which is OT here.
stats.stackexchange.com would be a better place to post for that.
However, if I understand correctly, using pls or anything else to try
to fit (some combination of) 501 variables to 16 data points -- and
then crossvalidate with 6 data points -- is utter nonsense. You just
have a fancy random number generator!
As I said, I think it better to follow up or complain about me on
stackexchange rather than here.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Feb 7, 2017 at 4:49 PM, Ladislav Rozkošný
<ladarozkosny at seznam.cz> wrote:
>
>
>
> Hi,
>
>
>
>
> I'm trying to fit PLSR model in R with 'pls' package with 22 samples (16
> train, 6 test). I know that basic for considering of number of component is
> cross-validation (in my case 'LOO') and then I should choose number of
> component with minimum of RMSEP (or first minimum). But problem is that
> values of RMSEP is increasing (not the opposite). Should I choose only 1
> component?
>
>
>
>
> And then I tried compute R2 with my test-dataset (6 samples) and I received
> nonsensical values (below 0, bigger then 1).
>
> Do you have any idea what may be caused? If it's my problem with fitting or
> problem with datasets.
>
>
>
>
> Below, you can see my results:
>
>
>
>
>>pH.spec<-plsr(pH ~ spec, data=soil.train, validation="LOO")
>
>>summary(pH.spec)
>
> Data: X dimension: 16 501
> Y dimension: 16 1
> Fit method: kernelpls
> Number of components considered: 14
>
> VALIDATION: RMSEP
> Cross-validated using 16 leave-one-out segments.
> (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7
> comps 8 comps 9 comps 10 comps 11 comps
> CV 0.5343 0.5435 0.5506 1.629 1.617 1.742 1.921
> 1.979 1.977 1.971 1.972 1.972
> adjCV 0.5343 0.5419 0.5486 1.587 1.570 1.688 1.860
> 1.916 1.914 1.908 1.910 1.909
> 12 comps 13 comps 14 comps
> CV 1.972 1.972 1.972
> adjCV 1.909 1.909 1.909
>
> TRAINING: % variance explained
> 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
> 9 comps 10 comps 11 comps 12 comps
> X 96.410 99.655 99.87 99.90 99.93 99.94 99.95 99.96
> 99.96 99.97 99.98 99.99
> pH 3.649 8.342 19.41 67.48 88.96 97.19 99.69 99.94
> 99.99 100.00 100.00 100.00
> 13 comps 14 comps
> X 99.99 100
> pH 100.00 100
>
>
>
>
>> R2(pH.spec, newdata = soil.test)
> (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps
> 6 comps 7 comps 8 comps
> -1.65763 -0.60849 -0.05253 -0.72870 -2.84718 -2.34102
> -3.28201 -3.68611 -3.69817
> 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
>
> -3.77271 -3.74585 -3.76342 -3.76074 -3.76110 -3.76115
>
>
>
>
>
>
> Thank you in advance for your help
>
>
>
>
>
>
> =
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list