[R] scatterplot of 100000 points and pdf file format

Witold Eryk Wolski wolski at molgen.mpg.de
Thu Nov 25 09:33:46 CET 2004


Prof Brian Ripley wrote:

> On Wed, 24 Nov 2004 Ted.Harding at nessie.mcc.ac.uk wrote:
>
>> On 24-Nov-04 Witold Eryk Wolski wrote:
>>
>>> Hi,
>>> I want to draw a scatter plot with 1M  and more points
>>> and save it as pdf.
>>> This makes the pdf file large.
>>> So i tried to save the file first as png and than convert
>>> it to pdf. This looks OK if printed but if viewed e.g. with
>>> acrobat as document figure the quality is bad.
>>>
>>> Anyone knows a way to reduce the size but keep the quality?
>>
>>
>> If you want the PDF file to preserve the info about all the
>> 1M points then the problem has no solution. The png file
>> will already have suppressed most of this (which is one
>> reason for poor quality).
>>
>> I think you should give thought to reducing what you need
>> to plot.
>>
>> Think about it: suppose you plot with a resolution of
>> 1/200 points per inch (about the limit at which the eye
>> begins to see rough edges). Then you have 40000 points
>> per square inch. If your 1M points are separate but as
>> closely packed as possible, this requires 25 square inches,
>> or a 5x5 inch (= 12.7x12.7 cm) square. And this would be
>> solid black!
>>
>> Presumably in your plot there is a very large number of
>> points which are effectively indistinguisable from other
>> points, so these could be eliminated without spoiling
>> the plot.
>>
>> I don't have an obviously best strategy for reducing what
>> you actually plot, but perhaps one line to think along
>> might be the following:
>>
>> 1. Multiply the data by some factor and then round the
>>   results to an integer (to avoid problems in step 2).
>>   Factor chosen so that the result of (4) below is
>>   satisfactory.
>>
>> 2. Eliminate duplicates in the result of (1).
>>
>> 3. Divide by the factor you used in (1).
>>
>> 4. Plot the result; save plot to PDF.
>>
>> As to how to do it in R: the critical step is (2),
>> which with so many points could be very heavy unless
>> done by a well-chosen procedure. I'm not expert enough
>> to advise about that, but no doubt others are.
>
>
> unique will eat that for breakfast
>
>> x <- runif(1e6)
>> system.time(xx <- unique(round(x, 4)))
>
> [1] 0.55 0.09 0.64 0.00 0.00
>
>> length(xx)
>
> [1] 10001
>
>


?table -> reduces the data
and
?image -> shows it.
And this is doing exactly what I need. (not my idea but one of Thomas 
Unternäher).  Thanks Thomas.


/E

-- 
Dipl. bio-chem. Witold Eryk Wolski
MPI-Moleculare Genetic
Ihnestrasse 63-73 14195 Berlin
tel: 0049-30-83875219                 __("<    _
http://www.molgen.mpg.de/~wolski      \__/    'v'
http://r4proteomics.sourceforge.net    ||    /   \
mail: witek96 at users.sourceforge.net    ^^     m m
      wolski at molgen.mpg.de




More information about the R-help mailing list