[R] Comparing distributions

Thu Jun 24 18:07:17 CEST 2010

If you want a more objective eye-ball test, look at:

   Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
     D.F and Wickham, H. (2009) Statistical Inference for exploratory
     data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
     367, 4361-4383 doi: 10.1098/rsta.2009.0120

One implementation of this procedure is the vis.test function in the TeachingDemos package.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ralf B
> Sent: Wednesday, June 23, 2010 8:53 PM
> To: Robert A LaBudde
> Cc: r-help at r-project.org
> Subject: Re: [R] Comparing distributions
> 
> The diagram only serves as a rough example to give you an idea.
> 
> To be more precise I would like to give more detail: The data
> represents movements from two types of pointing device (e.g. mouse,
> pointer, ) along an axis. The data has diffreent parameters -- such as
> different pointing devices, different axis, split by different
> experiment conditions etc. but the problem is always the same: I would
> like find out if their distributions correlate and would like to have
> some kind of 'objective' (Yes, I know -- nothing is objective -- but
> eye-balling isn't either.) measure, test, etc. These would be
> accompanied by Q-Q plots and density plots to get a general feeling of
> what is going on and become part of the discussion. I don't expect a
> solution from here, but perhaps a general direction where I could find
> my kind of problem being understood.
> 
> Ralf
> 
> 
> 
> On Wed, Jun 23, 2010 at 10:07 PM, Robert A LaBudde <ral at lcfltd.com>
> wrote:
> > Your "*" curve apparently dominates your "+" curve.
> >
> > If they have the same total number of data each, as you say, they
> both
> > cannot sum to the same value (e.g., N = 10000 or 1.000).
> >
> > So there is something going on that you aren't mentioning.
> >
> > Try comparing CDFs instead of pdfs.
> >
> > At 03:33 PM 6/23/2010, Ralf B wrote:
> >>
> >> I am trying to do something in R and would appreciate a push into
> the
> >> right direction. I hope some of you experts can help.
> >>
> >> I have two distributions obtrained from 10000 datapoints each (about
> >> 10000 datapoints each, non-normal with multi-model shape (when
> >> eye-balling densities) but other then that I know little about its
> >> distribution). When plotting the two distributions together I can
> see
> >> that the two densities are alike with a certain distance to each
> other
> >> (e.g. 50 units on the X axis). I tried to plot a simplified picture
> of
> >> the density plot below:
> >>
> >>
> >>
> >>
> >> |
> >> |                                                         *
> >> |                                                      *     *
> >> |                                                   *    +   *
> >> |                                              *     +     +  *
> >> |                     *        +           *   +            +  *
> >> |                 *        +*     +   *  +                   + *
> >> |              *       +       *     +                           +*
> >> |           *       +
> +*
> >> |        *       +
>  +*
> >> |     *      +
>    +
> >> *
> >> |  *      +
> >> + *
> >> |___________________________________________________________________
> >>
> >>
> >> What I would like to do is to formally test their similarity or
> >> otherwise measure it more reliably than just showing and discussing
> a
> >> plot. Is there a general approach other then using a Mann-Whitney
> test
> >> which is very strict and seems to assume a perfect match. Is there a
> >> test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
> >> are there any other similarity measures that could give me a
> statistic
> >> about how close these two distributions are to each other ? All I
> can
> >> say from eye-balling is that they seem to follow each other and it
> >> appears that one distribution is shifted by a amount from the other.
> >> Any ideas?
> >>
> >> Ralf
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ================================================================
> > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> > Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> > 824 Timberlake Drive                     Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ================================================================
> >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.