[R] PCA for Binary data

Simon Blomberg s.blomberg1 at uq.edu.au
Wed Jun 13 05:37:03 CEST 2007


You might try (detrended) correspondence analysis, which is designed for
"count" data, if it makes sense to treat your binary data  that way.
I've used ade4 and also vegan, and they are both good packages for these
types of ordinations. You could also look at non-metric multidimensional
scaling. There seems to be 2 "schools" of ordination. The Europeans like
eigenanalysis methods (PCA, correspondence analysis, multiple
correspondence analysis, coinertia analysis etc.). The Americans seem to
prefer MDS.

Cheers,

Simon.

 This is On Tue, 2007-06-12 at 20:17 -0700, Spencer Graves wrote:
> The problem with applying prcomp to binary data is that it's not 
> clear what problem you are solving. 
> 
>       The standard principal components and factor analysis models 
> assume that the observations are linear combinations of unobserved 
> "common" factors (shared variability), normally distributed, plus normal 
> noise, independent between observations and variables.  Those 
> assumptions are clearly violated for binary data. 
> 
>       RSiteSearch("PCA for binary data") produced references to 'ade4' 
> and 'FactoMineR'.  Have you considered these?  I have not used them, but 
> FactoMineR included functions for 'Multiple Factor Analysis for Mixed 
> [quantitative and qualitative] Data'
>   
>       Hope this helps. 
>       Spencer Graves
> 
> Josh Gilbert wrote:
> > I don't understand, what's wrong with using prcomp in this situation?
> >
> > On Sunday 10 June 2007 12:50 pm, Ranga Chandra Gudivada wrote:
> >   
> >> Hi,
> >>
> >>     I was wondering whether there is any package implementing Principal
> >> Component Analysis for Binary data
> >>
> >>                                               Thanks chandra
> >>
> >>
> >> ---------------------------------
> >>
> >>
> >> 	[[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> >> self-contained, reproducible code.
> >>     
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
Faculty of Biological and Chemical Sciences 
The University of Queensland 
St. Lucia Queensland 4072 
Australia

Room 320, Goddard Building (8)
T: +61 7 3365 2506 
email: S.Blomberg1_at_uq.edu.au 

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.



More information about the R-help mailing list