[Rd] ecdf with lots of ties is inefficient (PR#7292)

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Oct 17 09:30:38 CEST 2004


This is easy: x <- sort(x) should be first (as that drops NAs).  Fixed in
R-patched.

On Sun, 17 Oct 2004, stefano iacus wrote:

> I would add that some action has to be taken in presence of missing 
> values, i.e.
> 
>  > x <- c(1,2,2,4,7, NA, 10,12, 15,20)
>  > ecdf(x)
> Error in xy.coords(x, y) : x and y lengths differ
> 
> stefano
> 
> On Oct 17, 2004, at 8:50 AM, martin at gsc.riken.jp wrote:
> 
> > Full_Name: Martin Frith
> > Version: R-2.0.0
> > OS: linux-gnu
> > Submission from: (NULL) (134.160.83.73)
> >
> >
> > I have large vectors containing 100,000 to 20,000,000 numbers. 
> > However, they
> > only contain a few hundred *distinct* numbers (e.g. positive integers 
> > < 200).
> > When I do ecdf(v), it either runs out of memory, or it succeeds, but 
> > when I plot
> > the ecdf with postscript, the output is unnecessarily bloated because 
> > the same
> > lines get redrawn many times. The complexity of ecdf should depend on 
> > how many
> > distinct numbers there are, not how many total numbers.
> >
> > This is my first bug report, so forgive me if I've done something 
> > stupid!
> >
> > ______________________________________________
> > R-devel at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list