[R] error with princomp

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Sun May 21 22:01:44 CEST 2000


Faheem Mitha <faheem at email.unc.edu> writes:

> I have a data set, of 2061 rows and 99 columns originally. Now I guess it
> is going to be 97 columns, since the first column was all zeros (even
> Splus choked on this and I deleted it earlier) and the second one was all
> ones. Anyway, the first 64 (was 66) columns are binary data. The last 33
> are numeric data. Now, I thought that a reasonable thing to do (in fact,
> the only thing I could think of) was to treat the first 64 columns as
> numeric zeros and ones, and then use the cor=TRUE flag (ie use the
> correlation matrix instead of the corelation matrix). This is advertised
> as a way of handling cases when the data is not all of the same scale. So
> that is what I did. Any comments/suggestions?

Whatever the software, you're likely to get in trouble trying to
interpret the result of a PCA on binary data (using correlations or
not). It can be tricky enough with continuous data....

Anyway, I bet some of those 64 binary columns will turn out to be
linearly dependent, either overtly by some of them summing to a
constant or more subtly because of some combinations being absent.

A QR decomposition of your data matrix might be enlightening. Look at
the rank and the pivoting information. 

It does seem that we're handling the singular case suboptimally,
though. Ideally, one should apply a fuzz factor before declaring that
the matrix isn't NND and I don't think there's a problem with
factoring out a null space in the PCA.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list