[R] plotting ecdf; R is stalled

Bill.Venables at csiro.au Bill.Venables at csiro.au
Fri Mar 5 02:05:36 CET 2010


If it does finish, it will take some time.  And what for?

If all you want is a plot to look at, why are you using all 33 million observations?  Chances are that a sample of, say, 10000 will get you about as good as a plot of an ecdf would do.  Have you tried

plot.ecdf(c(range(myDataVector), sample(myDataVector, 10000)))

for example?  An alternative would be to sort x and take a systematic sample starting at the first observation.  10000 is in fact a bit of an overkill.

Bill Venables
CSIRO/CMIS Cleveland Laboratories


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan
Sent: Friday, 5 March 2010 10:10 AM
To: r-help
Subject: [R] plotting ecdf; R is stalled

Dear R-help:
      I am trying to plot the cumulative distribution function of a
vector of around 33 million numeric observations.

> plot.ecdf(myDataVector)

R has been non-responsive for about an hour, and my guess is that it's
probably not going to finish.

Does anybody have a sense whether this a reasonable experience (and if
so, is there a way to get the desired effect, or am I SOL)?  I can't
find anything in the help archives.

OS: Windows 7 64-bit; R version 2.10.1; RAM: 4 gb

Thanks,
Jonathan

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list