[R] meaning of error message about collinearity

Liaw, Andy andy_liaw at merck.com
Mon Jul 15 14:46:09 CEST 2002


You are using a method that needs to estimate the covariance matrix of all
the variables.  If you have 80 variables, there are (80+1)*80/2 = 3240
variances and covariances to estimate.  How many data points do you think
you need to do that?

Some people assume the covariance matrix is diagonal (i.e., assuming all the
variables are uncorrelated).  Even then you still have 80 variances to
estimate.  Besides, I don't think lda() does that.

If you really have to discrinimate 40 data points with 80 variables, use
methods that do not rely on the estimation of covariance matrix.

Andy

> -----Original Message-----
> From: Adaikalavan Ramasamy [mailto:ramasamy at stats.ox.ac.uk]
> Sent: Friday, July 12, 2002 4:08 PM
> To: r-help at r-project.org; allstat at jiscmail.ac.uk
> Subject: [R] meaning of error message about collinearity
> 
> 
> Just a quick question. I am trying to fit an LDA model with a 
> restricted
> subsample. My X is a numerical matrix and Y is vector of 
> factor response.
> 
> fit _  lda( Y[1:50]  ~ X[1:50, ]  )
> 
> gives the following error message: variables are collinear in:
> lda.default(x, grouping, ...)
> 
> I am guessing this is the problem of rank deficiency as I 
> have about 80
> variable. [since the lda works with subsample of size 80 and above]
> 
> Q1: Is my interpretation of the error message correct ?
> Q2: I am using the fit to prediction purposes etc. Is this 
> likely to be
> affected. ie  how serious is this problem ?
> Q3: Is there a good website about sound statistical 
> theory/practical of
> overcoming problem of rank deficiency if this is indeed the 
> source of the
> error.
> 
> Many thanks, Adai.
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: 
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> 

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message.  If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list