[R] Transport and Earth Mover's Distance

Lorenzo Isella lorenzo.isella at gmail.com
Tue Mar 7 16:32:04 CET 2017


Dear Dominic,
Thanks a lot for the quick reply.
Just a few questions to make sure I got it all right (I now understand that
transport and spatstat in particular can do much more than I need
right now).
Essentially I am after the Wasserstein distance between univariate
distributions (and it would be great if I can extend it to the
case of two distributions with a different bin structure).

1) two distributions with the same bins (I identify each bin by the
central point in the bin).

n_bin <- 11 # number of bins

bin_structure <- seq(10, by=1, len=n_bin)

set.seed(1234)

x_counts <- rpois(n_bin, 10)
y_counts <- rpois(n_bin, 10)

x <- pp(as.matrix(cbind(bin_structure, x_counts)))
y <- pp(as.matrix(cbind(bin_structure, y_counts)))


match <- transport(x,y,p=1)
plot(x,y,match)
wasserstein_dist <- wasserstein(x,y,p=1,match)


2) Now I do not have the same bin structure


y2 <- pp(as.matrix(cbind(bin_structure+2, y_counts)))


match <- transport(x,y2,p=1)
plot(x,y2,match)
wasserstein_dist2 <- wasserstein(x,y2,p=1,match)


Do 1) and 2) make sense?

>
>If you have no particular need for binning, check out the function
>pppdist in the R-package spatstat, which offers a more flexible way
>to deal with point patterns of different size.


Well, this is not clear, but possibly very important for me.
My raw data consists of 2 univariate samples of unequal length.

suppose that

x<-rnorm(100)

and

y<-rnorm(90)

Is there a way to define the Wasserstein distance between them without
going through the binning procedure?



Many thanks!

Lorenzo



More information about the R-help mailing list