[Rd] Implicit vs explicit printing and the call stack

Sun May 13 14:21:26 CEST 2007

On 5/13/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Sun, 13 May 2007, hadley wickham wrote:
>
> >> First, it was not clear that you are talking about the output of
> >> traceback(), which is _a representation of_ the call stack and depends on
> >> the details of deparsing.
> >
> > Given that there is a substantial delay in one case, and not in the
> > other, I had assumed (perhaps falsely) that there was something more
> > fundamental going on that simply the representation of the call stack
> > by traceback. Perhaps I should have been more clear that the output of
> > traceback was a symptom of the problem, not what I was really
> > interested in.
>
> traceback() does very little apart from show you .Traceback, which is
> constructed (as a representation) at the time of the error.  My guess is
> that the concern is the time it takes to create .Traceback, but see
> below.
>
> >> Second, the difference is I believe not implicit vs explicit printing, but
> >> of printing an evaluated object vs printing a call.  What your subject
> >> line would correspond to is
> >>
> >> x <- ggplot(mtcars, aes(x = cyl, y = -mpg)) + scale_y_log10() +
> >> geom_point()
> >> x vs print(x)
> >>
> >> I can't demonstrate (what is aes?), but I expect that would show minimal
> >> differences.
> >
> > I don't think that is the case.  Using Gabor's reproducible example:
> >
> > library(lattice)
> >
> > x <- xyplot(conc ~ uptake, CO2, xlim = Inf)
> > x
> > traceback()
> > print(x)
> > traceback()
> >
> > shows the same differences that I reported.
>
> It is not the same as regards my comments: your example showed the details
> of mtcars, and that appeared to be your concern.  That example does not
> show the details of CO2.

Oh, I see - it is much more verbose, but CO2 has not been exploded out.

> Part of my point is that if your returned object does not contain the
> evaluated dataset (but, say, a promise to it), this will not happen.

What is the suggested way to create such a promise? A closure? eg.

ggplot <- function(data) {
list(data = function() data)
}

A promise would have the desirable property of not storing a copy of
the data in the object, and you wouldn't need to update the plot
object if you changed the data in between creation and first plotting
(but not between first and second plotting).

Or delayed assign?

ggplot <- function(data) {
out <- new.env()
delayedAssign("data", data, assign.env=out)
out
}

Or something else?

> >> The issue seems to be whether arguments have been evaluated, and if so
> >> whether they were promises.  Promises can be deparsed back to the symbolic
> >> representation rather than to their value.  When you call print explicitly
> >> _on the ggplot call_, you have promises to arguments (to support lazy
> >> evaluation).  As far as I know (this is a detail of your code), the object
> >> ggplot returns has evaluated 'mtcars' and contains the value and not the
> >> promise, hence the difference in the deparsing of the call stack.
> >
> > I'm not sure I completely follow, and I don't think your reasoning
> > holds given the example above. Please correct me if I am wrong.
>
> It does.  In one traceback() you have print.trellis(x), in the other
> print.trellis(<the result of get("x")>).  This is because in print(x),
> print.trellis is passed a promise, and in 'x', 'x' has been evaluated and
> print and print.trellis are passed the value (a list).

Ok, I think I get it now.  Thanks for the detailed explanation.

> [ ... ]
>
> Let me try to spell out the mechanisms in so far as I understand them (R
> lacks technical documentation):
>
> A) print(x).
>
> The R evaluator sets up an argument list containing a promise to evaluate
> 'x' in the calling environemnt. It then calls UseMethod("print"), which
> needs to evaluate the first argument to find the class, finds the method,
> and manipulates the call to be to print.ggplot.  At that point the details
> get complicated, but when evaluation gets to the body of print.ggplot, the
> intention is that it is just as if print.ggplot(x) had been called and so
> the first argument is a promise to evaluate 'x' in the calling environment
> of print().  (Because the details vary across the internals, I am not sure
> without tracing through the code if the promise is a new one or the
> evaluated one: in either case it will deparse to 'x'.)
>
> B) 'x' at top-level.
>
> This creates an anonymous object whose value is the result of eval(x) (at
> C level).  The R evaluator then notices that the R_Visible flag is set and
> calls PrintValueEnv on the anonymous object.  The latter notices that the
> object has the OBJECT bit set and so calls print() on the anonymous
> object. At that point the name 'x' is not available and the anonymous
> object deparses to its value.
>
>
> Suppose that printing integers had a bug.  Then my understanding is that
> print(2L+3L) would have '2L+3L' in the deparsed call, and '2L+3L' would
> have '5' (or '5L').  That is intended to illustrate that deparsing to the
> call may or may not be more compact than to the value.

Ok, that makes sense to me now.

> >> You chose to call traceback() without 'max.lines', and the answer would
> >> seem to be
> >>
> >> 1) don't use a buggy print method,
> >> 2) if you have to (to fix it), use 'max.lines' or do this in two steps.
> >>
> >> If you want your end users to see a nicer representation, reconsider what
> >> you actually return in your object.
> >
> > I don't really care about the representation in the call stack
> > (although it is a pain when debugging, even though max.lines helps),
> > the main problem is the 4-5 second delay after an error before control
> > returns to the user.  This does not occur with an explicit print.
>
> I can't reproduce that of course.  It might well be that deparsing the
> calls to dump the call stack to .Traceback is the problem. in which case
> my '1)' applies. (See the comments about try in ?traceback that will help
> you distinguish.)

Deparsing is probably the problem, as it needs to deparse a 55000 x 12
data.frame:

> system.time(deparse(diamonds))
   user  system elapsed
  2.912   0.015   2.961

which matches closely the delay I experience. In which case, I need to
fix my buggy print function, but what exactly is buggy about it?

Thanks again for your help,

Hadley