[R] scatterplot of 100000 points and pdf file format

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed Nov 24 17:16:28 CET 2004


On 24-Nov-04 Witold Eryk Wolski wrote:
> Hi,
> I want to draw a scatter plot with 1M  and more points
> and save it as pdf.
> This makes the pdf file large.
> So i tried to save the file first as png and than convert
> it to pdf. This looks OK if printed but if viewed e.g. with
> acrobat as document figure the quality is bad.
> 
> Anyone knows a way to reduce the size but keep the quality?

If you want the PDF file to preserve the info about all the
1M points then the problem has no solution. The png file
will already have suppressed most of this (which is one
reason for poor quality).

I think you should give thought to reducing what you need
to plot.

Think about it: suppose you plot with a resolution of
1/200 points per inch (about the limit at which the eye
begins to see rough edges). Then you have 40000 points
per square inch. If your 1M points are separate but as
closely packed as possible, this requires 25 square inches,
or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
solid black!

Presumably in your plot there is a very large number of
points which are effectively indistinguisable from other
points, so these could be eliminated without spoiling
the plot.

I don't have an obviously best strategy for reducing what
you actually plot, but perhaps one line to think along
might be the following:

1. Multiply the data by some factor and then round the
   results to an integer (to avoid problems in step 2).
   Factor chosen so that the result of (4) below is
   satisfactory.

2. Eliminate duplicates in the result of (1).

3. Divide by the factor you used in (1).

4. Plot the result; save plot to PDF.

As to how to do it in R: the critical step is (2),
which with so many points could be very heavy unless
done by a well-chosen procedure. I'm not expert enough
to advise about that, but no doubt others are.

Good luck!
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 24-Nov-04                                       Time: 16:16:28
------------------------------ XFMail ------------------------------




More information about the R-help mailing list