[R] How to formulate quadratic function with interaction terms for the PLS fitting model?

Sun Jul 16 18:52:34 CEST 2017

> On Jul 16, 2017, at 8:47 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> 
> ??
> If I haven't misunderstood, they are completely different!
> 
> 1) NIR must be a matrix, or poly(NIR,...) will fail.
> 2) Due to the previously identified bug in poly, degree must be
> explicitly given as poly(NIR, degree =2,raw = TRUE).
> 
> Now consider the following example:
> 
>> df <-matrix(runif(60),ncol=3)
>> y <- runif(20)
>> mdl1 <-lm(y~df*I(df^2))
>> mdl2 <-lm(y~df*poly(df,degree=2,raw=TRUE))
>> length(coef(mdl1))
> [1] 16
>> length(coef(mdl2))
> [1] 40
> 
> Explanation:
> In mdl1, I(df^2) gives the squared values of the 3 columns of df. The
> formula df*I(df^2) gives the 3 (linear) terms of df, the 3 pure
> quadratics of I(df^2), the 9 cubic terms obtained by crossing these,
> and the constant coefficient = 16 coefs.
> 
> In mdl2,  the poly() expression gives 9 variiables: 3 linear, 3 pure
> quadratic, 3 interactions (1.2, 1.3, 2.3) of these.  The df*poly()
> term would then give the 3 linear terms of df, the 9 terms of poly(),
> the crossings between these, and the constant coef = 40 coefs. Many of
> these will be NA since terms are repeated (e.g. the 3 linear terms of
> poly() and df) and therefore cannot be estimated.
> 
> Have I totally misunderstood what you meant or committed some other blunder?

I was thinking about different model specifications, but clearly I had failed to test my assumptions and will need to do more study:

> df <-matrix(runif(60),ncol=3)
> y <- runif(20)
> mdl1 <-lm(y~ df +I(df^2) )
> mdl2 <-lm(y~  poly(df,degree=2,raw=TRUE))
> mdl1

Call:
lm(formula = y ~ df + I(df^2))

Coefficients:
(Intercept)          df1          df2          df3     I(df^2)1     I(df^2)2     I(df^2)3  
     1.3382      -1.1431      -1.7894      -1.2675       0.9686       1.6605       1.0411  

> mdl2

Call:
lm(formula = y ~ poly(df, degree = 2, raw = TRUE))

Coefficients:
                          (Intercept)  poly(df, degree = 2, raw = TRUE)1.0.0  
                              1.28217                               -0.98032  
poly(df, degree = 2, raw = TRUE)2.0.0  poly(df, degree = 2, raw = TRUE)0.1.0  
                              0.89955                               -1.89019  
poly(df, degree = 2, raw = TRUE)1.1.0  poly(df, degree = 2, raw = TRUE)0.2.0  
                              0.09528                                1.63065  
poly(df, degree = 2, raw = TRUE)0.0.1  poly(df, degree = 2, raw = TRUE)1.0.1  
                             -1.03744                               -0.29368  
poly(df, degree = 2, raw = TRUE)0.1.1  poly(df, degree = 2, raw = TRUE)0.0.2  
                              0.09357                                0.93400  

> length(coef(mdl2))
[1] 10

I had been reason from my experience with atomic vectors. Clearly poly() handles matrices differently than the combination of `+.formula` with `I`. Thanks for furthering my education:

> df <-data.frame(y = runif(20), x=runif(20) )
> 
> (mdl1 <-lm(y~x +I(x^2) ,data=df) )

Call:
lm(formula = y ~ x + I(x^2), data = df)

Coefficients:
(Intercept)            x       I(x^2)  
     0.6435      -1.4477       1.8282  

> (mdl2 <-lm(y~  poly(x,degree=2,raw=TRUE), data=df) )

Call:
lm(formula = y ~ poly(x, degree = 2, raw = TRUE), data = df)

Coefficients:
                     (Intercept)  poly(x, degree = 2, raw = TRUE)1  
                          0.6435                           -1.4477  
poly(x, degree = 2, raw = TRUE)2  
                          1.8282  

Best;
David.

> 
> Cheers,
> Bert
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Sun, Jul 16, 2017 at 7:36 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> On Jul 13, 2017, at 7:43 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>> 
>>> Below.
>>> 
>>> -- Bert
>>> Bert Gunter
>>> 
>>> 
>>> 
>>> On Thu, Jul 13, 2017 at 3:07 AM, Luigi Biagini <luigi.biagini at gmail.com> wrote:
>>>> I have two ideas about it.
>>>> 
>>>> 1-
>>>> i) Entering variables in quadratic form is done with the command I
>>>> (variable ^ 2) -
>>>> plsr (octane ~ NIR + I (nir ^ 2), ncomp = 10, data = gasTrain, validation =
>>>> "LOO"
>>>> You could also use a new variable NIR_sq <- (NIR) ^ 2
>>>> 
>>>> ii) To insert a square variable, use syntax I (x ^ 2) - it is very
>>>> important to insert I before the parentheses.
>>> 
>>> True, but better I believe: see ?poly.
>>> e.g. poly(cbind(x1,x2,x3), degree = 2, raw = TRUE) is a full quadratic
>>> polynomial in x1,x2,x3 .
>>> 
>> 
>> Is there any real difference between
>> 
>> octane ~ NIR * I(NIR^2)
>> octane ~ NIR * poly(NIR, degree=2, raw=TRUE)
>> 
>> ?
>> (I though that adding raw = TRUE prevented the beneficial process of centering the second degree terms.)
>> __
>> David
>>> 
>>>> 
>>>> iii) If you want to make the interaction between x and x ^ 2 use the
>>>> command ":" -> x: I(x ^ 2)
>>>> 
>>>> iv) For multiple interactions between x and x ^ 2 use the command "*" -> x
>>>> *I (x ^ 2)
>>>> 
>>>> i) plsr (octane ~ NIR + NIR_sq, ncomp = 10, data = gasTrain, validation =
>>>> "LOO") I (x ^ 2)
>>>> ii)p lsr (octane ~ NIR + I(NIR^2), ncomp = 10, data = gasTrain, validation
>>>> = "LOO") I (x ^ 2)
>>>> iii)p lsr (octane ~ NIR : I(NIR^2), ncomp = 10, data = gasTrain, validation
>>>> = "LOO") I (x ^ 2)
>>>> iv)p lsr (octane ~ NIR * I(NIR^2), ncomp = 10, data = gasTrain, validation
>>>> = "LOO") I (x ^ 2)
>>>> 
>>>> 2 - For your regression, did you plan to use MARS instead of PLS?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Dear all,
>>>>> I am using the pls package of R to perform partial least square on a set of
>>>>> multivariate data.  Instead of fitting a linear model, I want to fit my
>>>>> data with a quadratic function with interaction terms.  But I am not sure
>>>>> how.  I will use an example to illustrate my problem:
>>>>> Following the example in the PLS manual:
>>>>> ## Read data
>>>>> data(gasoline)
>>>>> gasTrain <- gasoline[1:50,]
>>>>> ## Perform PLS
>>>>> gas1 <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO")
>>>>> where octane ~ NIR is the model that this example is fitting with.
>>>>> NIR is a collective of variables, i.e. NIR spectra consists of 401 diffuse
>>>>> reflectance measurements from 900 to 1700 nm.
>>>>> Instead of fitting with octane[i] = a[0] * NIR[0,i] + a[1] * NIR[1,i] + ...
>>>>> I want to fit the data with:
>>>>> octane[i] = a[0] * NIR[0,i] + a[1] * NIR[1,i] + ... +
>>>>> b[0]*NIR[0,i]*NIR[0,i] + b[1] * NIR[0,i]*NIR[1,i] + ...
>>>>> i.e. quadratic with interaction terms.
>>>>> But I don't know how to formulate this.
>>>>> May I have some help please?
>>>>> Thanks,
>>>>> Kelvin
>>>> 
>>>>       [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA