[R] compare histograms

Michael Bedward michael.bedward at gmail.com
Fri Oct 15 02:47:40 CEST 2010


Hi Rainer,

Great - many thanks for that.  Yes, I'm using OSX

I initially tried to use install.packages to get get a pre-built
binary of earthmovdist from Rforge, but it failed with...

In getDependencies(pkgs, dependencies, available, lib) :
  package ‘earthmovdist’ is not available

When I tried installing with type="source" this was also failing.

However, after reading your post I looked at the error messages
properly and it turned out to be a trivial problem. The .First
function defined in my .Rprofile was printing some text to the console
with cat() which was being incorrectly picked up by the package build
as if it was a makefile argument. When I commented out the call to cat
the package installed successfully. I haven't had this problem
installing other packages from source so I think there must be a
little problem with your setup (?)

Now that it's installed I look forward to trying it out shortly.

Thanks again.

Michael




On 15 October 2010 03:17, Rainer M Krug <r.m.krug at gmail.com> wrote:
>
>
> On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward <michael.bedward at gmail.com>
> wrote:
>>
>> Hi Juan,
>>
>> Yes, you can use EMD to quantify the difference between any pair of
>> histograms regardless of their shape. The only constraint, at least
>> the way that I've done it previously, is to have compatible bins. The
>> original application of EMD was to compare images based on colour
>> histograms which could have all sorts of shapes.
>>
>> I looked at the package that Dennis alerted me to on RForge but
>> unfortunately it seems to be inactive
>
> No - well, it depends how you define inactive: the functionality we wanted
> to include is included, therefore no further development was necessary.
>
>>
>> and the nightly builds are broken. I've downloaded the source code and
>> will have a look at it
>> sometime in the next few days.
>
> Thanks for alerting us - we will look into that. But just don't use the
> nightly builds, as they are not different to the last release. Just download
> the package for your system (I assume Windows or mac, as I just installed
> from source without problems under Linux).
> Let me know if it doesn't work,
> Cheers,
> Rainer
>
>>
>> Meanwhile, let me know if you want a copy of my own code. It uses the
>> lpSolve package.
>>
>> Michael
>>
>> On 14 October 2010 08:46, Juan Pablo Fededa <jpfededa at gmail.com> wrote:
>> > Hi Michael,
>> >
>> >
>> > I have the same challenge, can you use this earth movers distance it to
>> > compare bimodal distributions?
>> > Thanks & cheers,
>> >
>> >
>> > Juan
>> >
>> >
>> > On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward
>> > <michael.bedward at gmail.com>
>> > wrote:
>> >>
>> >> Just to add to Greg's comments: I've previously used 'Earth Movers
>> >> Distance' to compare histograms. Note, this is a distance metric
>> >> rather than a parametric statistic (ie. not a test) but it at least
>> >> provides a consistent way of quantifying similarity.
>> >>
>> >> It's relatively easy to implement the metric in R (formulating it as a
>> >> linear programming problem). Happy to dig out the code if needed.
>> >>
>> >> Michael
>> >>
>> >> On 13 October 2010 02:44, Greg Snow <Greg.Snow at imail.org> wrote:
>> >> > That depends a lot on what you mean by the histograms being
>> >> > equivalent.
>> >> >
>> >> > You could just plot them and compare visually.  It may be easier to
>> >> > compare them if you plot density estimates rather than histograms.
>> >> >  Even
>> >> > better would be to do a qqplot comparing the 2 sets of data rather
>> >> > than the
>> >> > histograms.
>> >> >
>> >> > If you want a formal test then the ks.test function can compare 2
>> >> > datasets.  Note that the null hypothesis is that they come from the
>> >> > same
>> >> > distribution, a significant result means that they are likely
>> >> > different (but
>> >> > the difference may not be of practical importance), but a
>> >> > non-significant
>> >> > test could mean they are the same, or that you just do not have
>> >> > enough power
>> >> > to find the difference (or the difference is hard for the ks test to
>> >> > see).
>> >> >  You could also use a chi-squared test to compare this way.
>> >> >
>> >> > Another approach would be to use the vis.test function from the
>> >> > TeachingDemos package.  Write a small function that will either plot
>> >> > your 2
>> >> > histograms (density plots), or permute the data between the 2 groups
>> >> > and
>> >> > plot the equivalent histograms.  The vis.test function then presents
>> >> > you
>> >> > with an array of plots, one of which is the original data and the
>> >> > rest based
>> >> > on permutations.  If there is a clear meaningful difference in the
>> >> > groups
>> >> > you will be able to spot the plot that does not match the rest,
>> >> > otherwise it
>> >> > will just be guessing (might be best to have a fresh set of eyes that
>> >> > have
>> >> > not seen the data before see if they can pick out the real plot).
>> >> >
>> >> > --
>> >> > Gregory (Greg) L. Snow Ph.D.
>> >> > Statistical Data Center
>> >> > Intermountain Healthcare
>> >> > greg.snow at imail.org
>> >> > 801.408.8111
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> >> >> project.org] On Behalf Of solafah bh
>> >> >> Sent: Monday, October 11, 2010 4:02 PM
>> >> >> To: R help mailing list
>> >> >> Subject: [R] compare histograms
>> >> >>
>> >> >> Hello
>> >> >> How to compare  two statistical histograms? How i can know if these
>> >> >> histograms are equivalent or not??
>> >> >>
>> >> >> Regards
>> >> >>
>> >> >>
>> >> >>
>> >> >>       [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> NEW GERMAN FAX NUMBER!!!
>
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
> UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Natural Sciences Building
> Office Suite 2039
> Stellenbosch University
> Main Campus, Merriman Avenue
> Stellenbosch
> South Africa
>
> Cell:           +27 - (0)83 9479 042
> Fax:            +27 - (0)86 516 2782
> Fax:            +49 - (0)321 2125 2244
> email:          Rainer at krugs.de
>
> Skype:          RMkrug
> Google:         R.M.Krug at gmail.com
>
>



More information about the R-help mailing list