[R] data structure for plsr

Andris Jankevics andza at osi.lv
Wed Oct 17 09:34:12 CEST 2007


Hi, Jim.

Actually you need to have ad "data.frame" object of define X and Y variables 
in it.
DATA$X must contain all data for X varibales (DATA$NIR in example)
DATA$Y response variable(s) for your X matrix data: (DATA$density in example)

For example you have X variables in matrix and one Y variables as vector:

> X <- matrix(rnorm(100),10,5)
> Y <- c(1:10)

Now create data.frame (SampN variable isn't essential for plsr model, but it's 
easier to form a data frame so):

> plsDATA <- data.frame(SampN=c(1:nrow(X)))
> plsDATA$X <- X
> plsDATA$Y <- as.matrix(Y)
> str (plsDATA)
'data.frame':   10 obs. of  3 variables:
 $ SampN: int  1 2 3 4 5 6 7 8 9 10
 $ X    : num [1:10, 1:5]  1.330  1.025 -1.931  0.552  0.126 ...
 $ Y    : int  1 2 3 4 5 6 7 8 9 10

 > dim (plsDATA)
[1] 10  3
> dim (plsDATA$X)
[1] 10  5
> dim (plsDATA$Y)
[1] 10  1
        
And fit a model:
> library (pls)
> plsr (X~Y,data=plsDATA)
Partial least squares regression, fitted with the kernel algorithm.
Call:
plsr(X ~ Y, data = plsDATA)



On Tuesday 16 October 2007 21:23:21 Bricklemyer, Ross S wrote:
> Jim,
>
> I tried str(yarn).  I received the following output:
>
> 'data.frame':   28 obs. of  3 variables:
>  $ NIR    : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : NULL
>   .. ..$ : NULL
>  $ density: num  100.0  80.2  79.5  60.8  60.0 ...
>  $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE ...
>
> I think the important structure for my application is in the NIR line.
> Now that I "know" what the structure is, what does it mean, and how do I
> get my data into the same structure?
>
> Ross


-- 
Andris Jankevics
Assistant
Department of Medicinal Chemistry
Latvian Institute of Organic Synthesis
Aizkraukles 21, LV-1006, Riga, Latvia



More information about the R-help mailing list