[R] Comparing multiple distributions

Ravi Varadhan rvaradhan at jhmi.edu
Thu May 31 18:09:33 CEST 2007


Your data is "compositional data". The R package "compositions" might be
useful. You might also want to consult the book by J. Aitchison: statistical
analysis of compositional data.

Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of jiho
Sent: Thursday, May 31, 2007 11:37 AM
To: R-help
Subject: Re: [R] Comparing multiple distributions

Nobody answered my first request. I am sorry if I did not explain my  
problem clearly. English is not my native language and statistical  
english is even more difficult. I'll try to summarize my issue in  
more appropriate statistical terms:

Each of my observations is not a single number but a vector of 5  
proportions (which add up to 1 for each observation). I want to  
compare the "shape" of those vectors between two treatments (i.e. how  
the quantities are distributed between the 5 values in treatment A  
with respect to treatment B).

I was pointed to Hotelling T-squared. Does it seem appropriate? Are  
there other possibilities (I read many discussions about hotelling  
vs. manova but I could not see how any of those related to my  
particular case)?

Thank you very much in advance for your insights. See below for my  
earlier, more detailed, e-mail.

On 2007-May-21  , at 19:26 , jiho wrote:
> I am studying the vertical distribution of plankton and want to  
> study its variations relatively to several factors (time of day,  
> species, water column structure etc.). So my data is special in  
> that, at each sampling site (each observation), I don't have *one*  
> number, I have *several* numbers (abundance of organisms in each  
> depth bin, I sample 5 depth bins) which describe a vertical  
> distribution.
>
> Then let say I want to compare speciesA with speciesB, I would end  
> up trying to compare a group of several distributions with another  
> group of several distributions (where a "distribution" is a vector  
> of 5 numbers: an abundance for each depth bin). Does anyone know  
> how I could do this (with R obviously ;) )?
>
> Currently I kind of get around the problem and:
> - compute mean abundance per depth bin within each group and  
> compare the two mean distributions with a ks.test but this  
> obviously diminishes the power of the test (I only compare 5*2  
> "observations")
> - restrict the information at each sampling site to the mean depth  
> weighted by the abundance of the species of interest. This way I  
> have one observation per station but I reduce the information to  
> the mean depths while the actual repartition is important also.
>
> I know this is probably not directly R related but I have already  
> searched around for solutions and solicited my local statistics  
> expert... to no avail. So I hope that the stats' experts on this  
> list will help me.
>
> Thank you very much in advance.

JiHO
---
http://jo.irisson.free.fr/



-- 
Ce message a iti virifii par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a iti trouvi.
CRI UPVD http://www.univ-perp.fr



More information about the R-help mailing list