[R] scatterplot of 100000 points and pdf file format

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Thu Nov 25 02:45:48 CET 2004


On 25-Nov-04 Ted Harding wrote:
> 'unique' will eat x for breakfast, indeed, but will have some
> trouble chewing (x,y).
> 
> I still can't think of a neat way of doing that.
> 
> Best wishes,
> Ted.

Sorry, I don't want to be misunderstood.
I didn't mean that 'unique' won't work for arrays.
What I meant was:

> X<-round(rnorm(1e6),3);Y<-round(rnorm(1e6),3)
> system.time(unique(X))
[1] 0.74 0.07 0.81 0.00 0.00
> system.time(unique(cbind(X,Y)))
[1] 350.81   4.56 356.54   0.00   0.00

However, still rounding to 3 d.p. we can try packing:

> Z<-100000000*X + 1000*Y
> system.time(W<-unique(Z))
[1] 0.83 0.05 0.88 0.00 0.00
> length(W)
[1] 961523

Though the runtime is small we don't get much reduction
and still W has to be unpacked.

With rounding to 2 d.p.

> X<-round(rnorm(1e6),2);Y<-round(rnorm(1e6),2)
> Z<-100000000*X + 1000*Y
> system.time(W<-unique(Z))
[1] 1.31 0.01 1.32 0.00 0.00
> length(W)
[1] 209882

so now it's about 1/5, but visible discretisation must be
getting close.

With 1 d.p.

> X<-round(rnorm(1e6),1);Y<-round(rnorm(1e6),1)
> Z<-100000000*X + 1000*Y
> system.time(W<-unique(Z))
[1] 0.92 0.01 0.93 0.00 0.00
> length(W)
[1] 4953

there's a good reduction (about 1/200) but the discretisation
would definitely now be visible. However, as I suggested before,
there's an issue of choice of constant (i.e. of the resolution
of the discretisation so that there's a useful reduction and
also the plot is acceptable).

I'd still like to learn of a method which avoids the
above method of packing, which strikes me as clumsy
(but maybe it's the best way after all).

Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 25-Nov-04                                       Time: 01:45:48
------------------------------ XFMail ------------------------------




More information about the R-help mailing list