[R] Comparing multiple distributions

Bert Gunter gunter.berton at gene.com
Thu May 31 18:56:23 CEST 2007

While Ravi's suggestion of the "compositions" package is certainly
appropriate, I suspect that the complex and extensive statistical "homework"
you would need to do to use it might be overwhelming (the geometry of
compositions is a simplex, and this makes things hard). As a simple and
perhaps useful alternative, use pairs() or splom() to plot your 5-D data,
distinguishing the different treatments via color and/or symbol.

In addition, it might be useful to do the same sort of plot on the first two
principal components (?prcomp) of the first 4 dimensions of your 5 component
vectors (since the 5th is determined by the first 4). Because of the
simplicial geometry, this PCA approach is not right, but it may nevertheless
be revealing. The same plotting ideas are in the compositions package done
properly (in the correct geometry),so if you are motivated to do so, you can
do these things there. Even if you don't dig into the details, using the
compositions package version of the plots may be realtively easy to
do,interpretable, and revealing -- more so than my "simple but wrong"
suggestions. You can decide.

I would not trust inference using ad hoc approaches in the untransformed
data. That's what the package is for. But plotting the data should always be
at least the first thing you do anyway. I often find it to be sufficient,

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of jiho
Sent: Thursday, May 31, 2007 8:37 AM
To: R-help
Subject: Re: [R] Comparing multiple distributions

Nobody answered my first request. I am sorry if I did not explain my  
problem clearly. English is not my native language and statistical  
english is even more difficult. I'll try to summarize my issue in  
more appropriate statistical terms:

Each of my observations is not a single number but a vector of 5  
proportions (which add up to 1 for each observation). I want to  
compare the "shape" of those vectors between two treatments (i.e. how  
the quantities are distributed between the 5 values in treatment A  
with respect to treatment B).

I was pointed to Hotelling T-squared. Does it seem appropriate? Are  
there other possibilities (I read many discussions about hotelling  
vs. manova but I could not see how any of those related to my  
particular case)?

Thank you very much in advance for your insights. See below for my  
earlier, more detailed, e-mail.

On 2007-May-21  , at 19:26 , jiho wrote:
> I am studying the vertical distribution of plankton and want to  
> study its variations relatively to several factors (time of day,  
> species, water column structure etc.). So my data is special in  
> that, at each sampling site (each observation), I don't have *one*  
> number, I have *several* numbers (abundance of organisms in each  
> depth bin, I sample 5 depth bins) which describe a vertical  
> distribution.
> Then let say I want to compare speciesA with speciesB, I would end  
> up trying to compare a group of several distributions with another  
> group of several distributions (where a "distribution" is a vector  
> of 5 numbers: an abundance for each depth bin). Does anyone know  
> how I could do this (with R obviously ;) )?
> Currently I kind of get around the problem and:
> - compute mean abundance per depth bin within each group and  
> compare the two mean distributions with a ks.test but this  
> obviously diminishes the power of the test (I only compare 5*2  
> "observations")
> - restrict the information at each sampling site to the mean depth  
> weighted by the abundance of the species of interest. This way I  
> have one observation per station but I reduce the information to  
> the mean depths while the actual repartition is important also.
> I know this is probably not directly R related but I have already  
> searched around for solutions and solicited my local statistics  
> expert... to no avail. So I hope that the stats' experts on this  
> list will help me.
> Thank you very much in advance.


Ce message a iti virifii par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a iti trouvi.
CRI UPVD http://www.univ-perp.fr

More information about the R-help mailing list