[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors

Martin Maechler maechler at stat.math.ethz.ch
Wed Aug 24 11:36:38 CEST 2016


>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 23 Aug 2016 14:33:58 +0200 writes:

>>>>> Dirk Eddelbuettel <edd at debian.org>
>>>>>     on Fri, 19 Aug 2016 11:40:05 -0500 writes:

    >> It is the old story of defined behaviour and expected outcomes. Hard to
    >> change now.

    > yes...  not impossible though... see below

    >> So I would suggest you do something like this in your ~/.Rprofile:

    R> smry <- function(...) summary(..., digits=6)
    R> smry(155555L)
    >> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    >> 155555  155555  155555  155555  155555  155555
    R> 

    >> Maybe call it Summary() instead.

    > yes, do use a different name.   There other such functions, 'summarize()'.

    > Simone wrote

    >> I had raised the matter ten years ago, and I was told that the topic was
    >> already very^3 old
    >> 
    >> https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html
    >> 
    >> there is some discussion on its origin and also a declaration of intents to
    >> change the default behaviour, which, unfortunately, remained a declaration.
    >> I agree that R could do better here, let's hope in less than ten years
    >> though. ;-)

    > and the 2006 thread he mentions is basically a similar question
    > and a reply by me that I agreed to some extent that a change was
    > desirable ... originally we had adhered to the S "standard"
    > which became the S+ one and at that time I did still have access
    > to a running instance of S-PLUS 6.2 where I had seen that
    > Insightful (the company selling curating and selling S-PLUS)
    > also had decided to change the ~15 year old S "standard"... and
    > indeed I was implicitly *asking* for proposals of such a change,
    > but I think I never saw a (careful) proposal.

    > In the spirit of probably 99% of other "base R" code, a change
    > should really *not* round __at all__ in the summary() methods,
    > but *only* in the print() methods of such summary() results.

    > OTOH, for back compatibility, if a user does use  summary(.., digits=.)
    > explicitly, these digits should be 'obeyed' of course.

    > I think summary(<1-variable>)  could easily, and relatively "back-compatibly"
    > be changed in the above vain.

    > One "real problem" is the wrong decision (also from S and S-PLUS
    > times IIRC) to return a "character" matrix for
    > summary(<data.frame>, ..)
    > or summary(<matrix>, ..)
    > (For a data frame, I think it should return a list() of
    > single-variable summary()es, or then a numeric matrix .. in
    > both cases have a good print() method)

    > because when you return a character matrix, all the numbers are
    > already rounded, ... and if we follow the above approach they 
    > would have to be rounded further... ``the horror''

    > I wonder how much code out there is relying on the internal
    > structure of  summary(<data.frame>).. because that is the one
    > part I'd definitely want to change, too.

[Talking to myself .. ;-)]
Yes, but that's the tough part to change.

This thread's topic is really only about changing summary.default(),
and I have started testing such a change now, and that does seem
very sensible:

- No rounding in summary.default(),  but
- (almost) back-compatible rounding in its print() method.

My current plan is to commit this to R-devel in a day or so,
unless unforeseen issues emerge.

Martin



More information about the R-devel mailing list