[R] OT: compare several graphs
Jan_Svatos at eurotel.cz
Tue Oct 23 09:50:13 CEST 2001
I am not sure too, if method I proposed is statistically sound
(probably not, as the way the density is estimated depends on chosen
and I had not do any "optimization" of this kind).
dist() and Euclidean distance: Yes, I am not sure, if use of Euclidean
distance is appropriate,
probably weighted distance (with weights defined by theoretical density, if
available) would be better.
The resample approach is interesting and for testing is definitely better
than plain Euclidean distance I proposed.
(My use of distances was motivated only to compare the magnitude of
differences as I tried to decide,
what profiles/graphs are more/less similar to given "typical" profile, and
no testing was involved).
OTOH, this test would probably not be sensitive to shift/location
alternatives, as I have checked experimentally.
>> Hi Sven,
>> I am just doing something similar-
>> my graphs are densities of nonnegative r.v's (with all probability mass
>> fixed bounded interval).
>> Then I compute their "distance" by dist (mva package needed), i.e.
>> my.dist<-dist(t(cbind(dens1$y, dens2$y)))
>> (provided that dist1$x==dist2$x, of course)
>> The problem of course is, how to decide about statistical and/or
>> "practical" significance of a difference.
>> I cannot remain myself of some correct statistical test of such
>I'm not sure if this is statistically sound (comments, please!), but what
>about a resampling approach:
> repeat some 1000 times:
> shuffle one column randomly, then compute the distance
> compare your distance to the empirical distribution of
> "resampled distances"
>In terms of R code:
> Nreps <- 5000
> dists <- numeric(Nreps)
> for(i in 1:Nreps)
> y2 <- sample(dens2$y)
> dists[i] <- dist(t(cbind(dens1$y, y2)))
> quantile(dists, 0.05)
>If the original distance is lower than the 5% quantile of the resampled
>dists, your two graphs would be "significantly more similar" than "random
>graphs". For a two-sided test, you could use
> quantile(dists, c(0.025, 0.975)).
>If this makes sense, there is still the problem of the correct distance
>measurement. By default, dist() calculates euclidean distances. I'm not
>it they are appropriate for this kind of data.
>As I said, please comment. It's just an idea I had (along the lines of the
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help