[R] scatterplot of 100000 points and pdf file format

Marc Schwartz MSchwartz at MedAnalytics.com
Wed Nov 24 17:22:56 CET 2004


On Wed, 2004-11-24 at 16:34 +0100, Witold Eryk Wolski wrote:
> Hi,
> 
> I want to draw a scatter plot with 1M  and more points and save it as pdf.
> This makes the pdf file large.
> So i tried to save the file first as png and than convert it to pdf. 
> This looks OK if printed but if viewed e.g. with acrobat as document 
> figure the quality is bad.
> 
> Anyone knows a way to reduce the size but keep the quality?

Hi Eryk!

Part of the problem is that in a pdf file, the vector based instructions
will need to be defined for each of your 10 ^ 6 points in order to draw
them.

When trying to create a simple example:

pdf()
plot(rnorm(1000000), rnorm(1000000))
dev.off()

The pdf file is 55 Mb in size.

One immediate thought was to try a ps file and using the above plot, the
ps file was "only" 23 Mb in size. So note that ps can be more efficient.

Going to a bitmap might result in a much smaller file, but as you note,
the quality does degrade as compared to a vector based image.

I tried the above to a png, then converted to a pdf (using 'convert')
and as expected, the image both viewed and printed was "pixelated",
since the pdf instructions are presumably drawing pixels and not vector
based objects.

Depending upon what you plan to do with the image, you may have to
choose among several options, resulting in tradeoffs between image
quality and file size.

If you can create the bitmap file explicitly in the size that you
require for printing or incorporating in a document, that is one way to
go and will preserve, to an extent, the overall fixed size image
quality, while keeping file size small.

Another option to consider for the pdf approach, if it does not
compromise the integrity of your plot, is to remove any duplicate data
points if any exist. Thus, you will not need what are in effect
redundant instructions in the pdf file. This may not be possible
depending upon the nature of your data (ie. doubles) without considering
some tolerance level for "equivalence".

Perhaps others will have additional ideas.

HTH,

Marc Schwartz




More information about the R-help mailing list