[Rd] ecdf with lots of ties is inefficient (PR#7292)

Sun Oct 17 09:06:26 CEST 2004

I would add that some action has to be taken in presence of missing 
values, i.e.

 > x <- c(1,2,2,4,7, NA, 10,12, 15,20)
 > ecdf(x)
Error in xy.coords(x, y) : x and y lengths differ

stefano

On Oct 17, 2004, at 8:50 AM, martin at gsc.riken.jp wrote:

> Full_Name: Martin Frith
> Version: R-2.0.0
> OS: linux-gnu
> Submission from: (NULL) (134.160.83.73)
>
>
> I have large vectors containing 100,000 to 20,000,000 numbers. 
> However, they
> only contain a few hundred *distinct* numbers (e.g. positive integers 
> < 200).
> When I do ecdf(v), it either runs out of memory, or it succeeds, but 
> when I plot
> the ecdf with postscript, the output is unnecessarily bloated because 
> the same
> lines get redrawn many times. The complexity of ecdf should depend on 
> how many
> distinct numbers there are, not how many total numbers.
>
> This is my first bug report, so forgive me if I've done something 
> stupid!
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>