[R] scatterplot of 100000 points and pdf file format

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Thu Nov 25 15:30:39 CET 2004


Hi Andy,

On 25-Nov-04 Liaw, Andy wrote:
>> From: Ted.Harding at nessie.mcc.ac.uk
>> [...]
>> > X<-round(rnorm(1e6),3);Y<-round(rnorm(1e6),3)
>> > system.time(unique(X))
>> [1] 0.74 0.07 0.81 0.00 0.00
>> > system.time(unique(cbind(X,Y)))
>> [1] 350.81   4.56 356.54   0.00   0.00
> 
> Do you know if majority of that time is spent in unique() itself?
>  If so, which method?  What I see is:
> 
>> X<-round(rnorm(1e6),3);Y<-round(rnorm(1e6),3)
>> system.time(unique(X), gcFirst=TRUE)
> [1] 0.25 0.01 0.26   NA   NA
>> system.time(unique(cbind(X,Y)), gcFirst=TRUE)
> [1] 101.80   0.34 104.61     NA     NA
>> system.time(dat <- data.frame(x=X, y=Y), gcFirst=TRUE)
> [1] 10.17  0.00 10.24    NA    NA
>> system.time(unique(dat), gcFirst=TRUE)
> [1] 23.94  0.11 24.15    NA    NA
> 
> Andy

I want to look into this a bit more systematically (I have
an idea why 'unique' may be taking longer on the array from
'cbind' than on the dataframe), but I will be doing this on
a much faster machine than I immediately have to hand, so
will report results (if interesting) later.

Meanwhile, I'm not sure what you mean by "which method?",
and I'm also wondering what "gcFirst" is about.

Thanks,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 25-Nov-04                                       Time: 14:30:39
------------------------------ XFMail ------------------------------




More information about the R-help mailing list