[R] Transport and Earth Mover's Distance

Schuhmacher, Dominic dominic.schuhmacher at mathematik.uni-goettingen.de
Thu Mar 9 17:44:27 CET 2017


> Am 08.03.2017 um 11:28 schrieb Schuhmacher, Dominic <dominic.schuhmacher at mathematik.uni-goettingen.de>:
> 
> ...
>>> 
>>> If you have no particular need for binning, check out the function
>>> pppdist in the R-package spatstat, which offers a more flexible way
>>> to deal with point patterns of different size.
>> 
>> 
>> Well, this is not clear, but possibly very important for me.
>> My raw data consists of 2 univariate samples of unequal length.
>> 
>> suppose that
>> 
>> x<-rnorm(100)
>> 
>> and
>> 
>> y<-rnorm(90)
>> 
>> Is there a way to define the Wasserstein distance between them without
>> going through the binning procedure?
>> 
> Define, yes: the 1-Wasserstein distance in one-dimension is the area between the empirical cumulative distribution functions. If the samples had the same lengths this could be directly computed by
> 
> mean(abs(sort(x)-sort(y)))
> 
> Otherwise this needs some lines of code. I will include it in the next version of the transport package (soon).
> 
> Best regards,
> Dominic
> 
> 
Following up on this earlier post: transport 0.8-2, which is on CRAN now, offers the possibility to compute the Wasserstein distance between univariate samples of differing lengths (more precisely their empirical distributions).

library(transport)
x <- rnorm(100)
y <- rnorm(90)
wasserstein1d(x,y) 

Cheers, Dominic


------------------------------------
Dominic Schuhmacher
Professor of Stochastics
University of Goettingen
http://www.dominic.schuhmacher.name



More information about the R-help mailing list