[R] compare histograms

Michael Bedward michael.bedward at gmail.com
Thu Oct 14 03:15:37 CEST 2010


Hi Juan,

Yes, you can use EMD to quantify the difference between any pair of
histograms regardless of their shape. The only constraint, at least
the way that I've done it previously, is to have compatible bins. The
original application of EMD was to compare images based on colour
histograms which could have all sorts of shapes.

I looked at the package that Dennis alerted me to on RForge but
unfortunately it seems to be inactive and the nightly builds are
broken. I've downloaded the source code and will have a look at it
sometime in the next few days.

Meanwhile, let me know if you want a copy of my own code. It uses the
lpSolve package.

Michael

On 14 October 2010 08:46, Juan Pablo Fededa <jpfededa at gmail.com> wrote:
> Hi Michael,
>
>
> I have the same challenge, can you use this earth movers distance it to
> compare bimodal distributions?
> Thanks & cheers,
>
>
> Juan
>
>
> On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward <michael.bedward at gmail.com>
> wrote:
>>
>> Just to add to Greg's comments: I've previously used 'Earth Movers
>> Distance' to compare histograms. Note, this is a distance metric
>> rather than a parametric statistic (ie. not a test) but it at least
>> provides a consistent way of quantifying similarity.
>>
>> It's relatively easy to implement the metric in R (formulating it as a
>> linear programming problem). Happy to dig out the code if needed.
>>
>> Michael
>>
>> On 13 October 2010 02:44, Greg Snow <Greg.Snow at imail.org> wrote:
>> > That depends a lot on what you mean by the histograms being equivalent.
>> >
>> > You could just plot them and compare visually.  It may be easier to
>> > compare them if you plot density estimates rather than histograms.  Even
>> > better would be to do a qqplot comparing the 2 sets of data rather than the
>> > histograms.
>> >
>> > If you want a formal test then the ks.test function can compare 2
>> > datasets.  Note that the null hypothesis is that they come from the same
>> > distribution, a significant result means that they are likely different (but
>> > the difference may not be of practical importance), but a non-significant
>> > test could mean they are the same, or that you just do not have enough power
>> > to find the difference (or the difference is hard for the ks test to see).
>> >  You could also use a chi-squared test to compare this way.
>> >
>> > Another approach would be to use the vis.test function from the
>> > TeachingDemos package.  Write a small function that will either plot your 2
>> > histograms (density plots), or permute the data between the 2 groups and
>> > plot the equivalent histograms.  The vis.test function then presents you
>> > with an array of plots, one of which is the original data and the rest based
>> > on permutations.  If there is a clear meaningful difference in the groups
>> > you will be able to spot the plot that does not match the rest, otherwise it
>> > will just be guessing (might be best to have a fresh set of eyes that have
>> > not seen the data before see if they can pick out the real plot).
>> >
>> > --
>> > Gregory (Greg) L. Snow Ph.D.
>> > Statistical Data Center
>> > Intermountain Healthcare
>> > greg.snow at imail.org
>> > 801.408.8111
>> >
>> >
>> >> -----Original Message-----
>> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> >> project.org] On Behalf Of solafah bh
>> >> Sent: Monday, October 11, 2010 4:02 PM
>> >> To: R help mailing list
>> >> Subject: [R] compare histograms
>> >>
>> >> Hello
>> >> How to compare  two statistical histograms? How i can know if these
>> >> histograms are equivalent or not??
>> >>
>> >> Regards
>> >>
>> >>
>> >>
>> >>       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list