[R] when dimensionality is larger than the number of observations?

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 30 18:48:48 CEST 2006


On 5/30/06, Weiwei Shi <helprhelp at gmail.com> wrote:
> Hi, there:
>
> Can anyone here kindly point some good reference or links on this topic?
> Esp. some solutions from BioConductor or R, when dealing with
> microarray-like, "fat" data?


In that case there will be an entire subspace of coefficient
vectors that will give the same fitted values.

Lets take 3 rows of the iris data set
and regress column 1 on the rest.   There will
be an entire subspace of coefficients that correspond
to the same (unique) fitted values and we can get
one of those coefficient vectors using the generalized
inverse:

> # test data
> iris3 <- iris[c(1, 51, 101),]
> y <- iris3[,1]
> y
[1] 5.1 7.0 6.3
> X <- model.matrix(~., iris3[,2:5])
> X
    (Intercept) Sepal.Width Petal.Length Petal.Width Speciesversicolor
Speciesvirginica
1             1         3.5          1.4         0.2                 0
               0
51            1         3.2          4.7         1.4                 1
               0
101           1         3.3          6.0         2.5                 0
               1
attr(,"assign")
[1] 0 1 2 3 4 4
attr(,"contrasts")
attr(,"contrasts")$Species
[1] "contr.treatment"

>
> library(MASS) # needed for ginv
> coefs <- c(ginv(crossprod(X)) %*% crossprod(X, y))
> names(coefs) <- colnames(X)
> coefs
      (Intercept)       Sepal.Width      Petal.Length
Petal.Width Speciesversicolor  Speciesvirginica
        0.3619361         1.1497417         0.5443438
-0.2405670         0.7372685        -0.5207289
> X %*% coefs # fitted values
    [,1]
1    5.1
51   7.0
101  6.3



More information about the R-help mailing list