[R] ridiculous behaviour printing to eps: labels all messed

Mon Jun 8 15:38:09 CEST 2009

On 08-Jun-09 06:23:04, Peter Dalgaard wrote:
> maiya wrote:
>> OK, this is really weird!
>> 
>> here's an example code:
>> 
>> t1<-c(1,2,3,4)
>> t2<-c(4,2,4,2)
>> plot(t1~t2, xlab="exp1", ylab="exp2")
>> dev.copy2eps(file="test.eps")
>> 
>> that all seems fine...
>> 
>> until you look at the eps file created, where for some weird reason,
>> if you scroll down to the end, the code reads:
>> 
>> /Font1 findfont 12 s
>> 0 setgray
>> 214.02 18.72 (e) 0 ta
>> -0.360 (xp1) tb gr
>> 12.96 206.44 (e) 90 ta
>> -0.360 (xp2) tb gr
>> 
>> Which means, that the labels "exp1" and "exp2" get split up!?!?
>> Now visually that doesn't matter, but I use the labels to refer
>> to them in LaTeX using psfrag, so I have to know exactly what they
>> are called in the .eps file in order to reference them correctly. 
>> 
>> I've tried other labels and the splitting up seems completely
>> random i.e doesn't have anything to do with the length of the
>> label etc. 
>> 
>> I am completely lost here, can someone help me figure out what is
>> going on here?
>> Maja
> 
> Look at the useKerning argument to postscript().
> 
> (AFAIR the rationale for the behaviour is that you cannot rely on
> having the output device do kerning while getting reliable string
> width calculations. I can't offhand recall the examples where it
> mattered, though.)

1. Maja has asked for help to figure out what is going on. Peter has
indicated that it is to do with kerning (which is correct, of course,
but perhaps needs spelling out).

The basic PostScript operation to plant the word "Working" on the
page would be to move the current point to the desired initial
point of the word (where the bottom lefthand corner of the glyph
bounding-box of "W" should go), and then execute the PS command
"(Working) show". This plants "W" at that position, moves right
by the width of "W", plants "o", moves right by the width of "o",
and so on. The final current point is at the bottom right-hand
corner of the bounding-box of "g". All of this using the font which
has been selected and sized prior to "(Working) show". The widths
are obtained by looking up a stored table of widths of glyphs in
the current font.

However, if you were to look at the printed/displayed output of
this with any of the standard variable-width fonts (TimesRoman,
HelveticaRoman) as produced by practically any PS-capable text
formatter, you would see that the "o" in "Working" would be tucked
nder the right-hand arm of the "W", so that the "W" in fact overhangs
the "o" somewhat. This is done to improve the visual appearance of
the text (and indeed to make it slightly easier to read), and it is
an example of the process called "kerning", in which a character is
displaced slightly from the position it would occupy based purely
on widths. The amount of displacement is computed by the application
which generates the PostScript in the first place, and is often done
by looking up the displacement in a stored table of "kern-pairs"
(in this case the pair [W,o]).

The effect of this on the resulting PS output in the above example
would be (at least) that "Working" would be split into "W"+"orking",
the "W" would be planted first, then the kerning displacement (which
amounts to displacing the current point from where it was after
planting "W") is done, then the rest of "orking" (which might include
further kerning splits).

This would appear in the PS output as something like

  (W) show -1.234 0 rmoveto (orking) show

(though most applications will wrap the PS primitives "show", "rmoveto"
etc. inside private definitions, so what you see wouldn't be exactly
the above).

The effect of the above would be:
  Plant "W". Move right by the width of "W".
  Move right by (-1.234) points (i.e. left) and up by 0.
  Plant "orking".

All very straightforward, as far as rendering the printed/displayed
output is concerned. However, if some other application were to
search the PS file for "Working", it would not find it because the
word has been broken up by the kerning -- unless, of course, the
application were clever enough to recognise that

  (W) show -1.234 0 rmoveto (orking) show

amounts to a single word. To some extent this is possible, but it
would have to be very smart indeed to get it right reliably. 

In Maja's case, the result for "exp1" was:

  /Font1 findfont 12 s
  0 setgray
  214.02 18.72 (e) 0 ta
  -0.360 (xp1) tb gr

where "tb" has been defined within the file by

  /tb  { 2 -1 roll 0 rmoveto show } def

which converts "-0.360 (xp1) tb" into

  (xp1) -0.360 0 rmoveto show

which is equivalent to

  -0.360 0 rmoveto (xp1) show

So this explains (I hope clearly ... ) what Maja found puzzling.
The fact (which she observed) that the splitting seems to be
"completely random" is a result of whatever the table of kern-pairs
happens to hold for the sequence of characters in question. In this
case, the pair [e,x] will be in the table, and the kerning means that
"x" should be shifted 0.360 points to the left, to be closer to the "e".

2. Question: Where does R's postscript() get its kern-pairs from?

3. As Peter indicates, one solution is to set useKerning=FALSE
in the call to postscript() or to dev.copy2eps() (which accepts the
same arguments as postscript()).

4. Zeljko's first suggestion (to use postscript() in the first place,
rather than dev.copy2eps() after plotting) is probably not going to
be useful, since you would have to set useKerning=FALSE in postscript()
anyway, and you can equally well set it in dev.copy2eps().

5. Zeljko's second suggestion (of using one-letter labels) is also
unlikely to be useful, and could be dangerous (e.g. if you used "a"
as a label, and it also occurred "in a text string" in the diagram
and the latter was set by (in) show ... (a) show ... or the like).
Two-letter names might be safer, e.g. "AA", "BB", ... , since it
would be unusual for such pairs to be kerned (though not impossible,
since PostScript allows quite arbitrary things to be done with text,
if the user wants it to happen, or the application thinks it would
be a Good Thing). But, in any case, the desirable feedback from seeing
"exp1" and "exp2" (or whatever) on the plot would then be missing.

I would be strongly inclined to the "useKerning=FALSE" solution
in this case.

6. I'm not a LaTeX user, so can't comment much on psfrag. However,
I have just read its documentation "The PSFrag System" at
http://www.tug.org/teTeX/tetex-texmfdist/doc/latex/psfrag/pfgguide.ps

wherein Section 8 ("Common mistakes, known problems, and bugs")
discusses some of the issues that can arise from the fact that
psfrag looks in the EPS file for an exact match for the tags.
Executive Summary: Beware!

7. This is exercise for my hobby-horse. If the EPS graphic produced
by R is, as it stands, adequate for your purposes then leave it
alone and simply import it into your document.

If it is not, then my approach (almost invariably) is to extract
from R the numerical data which define the graphic, and use these
data in an independent document-creation package (which could be
LaTeX, perhaps, though not in my case) to re-create the graphic
from scratch, and then include any embellishments (such as
equations, etc.) which the independent software can do properly,
when R can not do it properly or at all.

An example of this can be seen at

  http://www.zen89632.zen.co.uk/R/EM_Aitken/em_aitken.pdf

which is a little expository document about Aitken Acceleration
of iterative processes, as applied to the EM algorithm.

The computations were done in R, and the resulting data were used
to draw the figures externally (in fact within the document itself).
All three figures therein have some mathematical symbolism, the
second one on Page 2 particularly. The full R code is also given.

LaTeX-ers: Can you make such figures with the same details in them,
without fiddling with an R-generated EPS (i.e. from scratch, using
data from R)? I hope so!

For what it's worth -- the software used was GNU groff (with the
'pic' preprocessor to create the figures). But that's just me.
Dinosaurs do not easily digest organisms more recently evolved ...

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jun-09                                       Time: 14:38:03
------------------------------ XFMail ------------------------------