[R] Summary shows wrong maximum

Mike Prager mike.prager at noaa.gov
Thu Dec 7 15:39:19 CET 2006


Bert--

Well, in an attempt to be pithy, I think I lost my message.

The comment was directed not at you specifically, but at the
idea that, given four print positions, one would ever want to
print zeroes instead of data without an explicit warning.

I quite agree with your comments on precision.  However, if more
than those two or three digits are *printed*, I think they
should be as accurate as possible, or accompanied in each place
by a written disclaimer.

Let's say that the mean of the data is not zero, but that the
precision is well within the range of floating point.  Then,
information is being thrown away for no clear reason.  What
makes it "nasty" in my opinion is that the information *appears*
to be there.  (Maybe this is a problem in semiotics.)  So while
I don't think "1.01e3" is more correct than "1010", it does not
appear to be conveying information that has been stripped from
the result.

Is the following really how we want R to work?

> a <- c(19001., 19002., 19003., 19006.)
> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19000   19000   19000   19000   19000   19010 

Respectfully,
--Mike



Bert Gunter <gunter.berton at gene.com> wrote:

> Mike:
> 
> I offered no opinion -- and really didn't have any -- about the worthiness
> of any of the comments that were made. I just liked Brian's little quotable
> aside.
> 
> But since you bait me a bit ...
> 
> In general, I believe that showing th 2-3 most "important" -- **not
> significant** -- digits **and no more** is desirable. By " most important" I
> mean the leftmost digits which are changing in the data (there are some
> caveats in the presence of extreme outliers). Printing more digits merely
> obfuscates the ability of the eye/brain to perceive the patterns of change
> in the data, the presumed intent of displaying it (not of storing it, of
> course). Displaying excessive digits to demonstrate (usually falsely) one's
> precision is evil. Clarity of communications is the standard we should
> aspire to.
> 
> These views have been more eloquently expressed by  A.S.C Ehrenburg and
> Howard Wainer among others...
> 
> -- Bert
> 
> 
> Bert Gunter
> Nonclinical Statistics
> 7-7374
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mike Prager
> Sent: Wednesday, December 06, 2006 11:46 AM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Summary shows wrong maximum
> 
> I don't know about candidacy, and I'm not going to argue about
> "correctness," but it seems to me that the only valid reasons to
> limit precision of printing in a statistics program are (1) to
> save space and (2) to allow for machine limitations. This is
> neither. To chop off information and replace it with zeroes is
> just plain nasty.
> 
> 
> Bert Gunter <gunter.berton at gene.com> wrote:
> 
> >  
> > Folks:
> > 
> > Is 
> > 
> > "So this is at best a matter of opinion, 
> > and credentials do matter for opinions."
> > 
> > -- Brian Ripley
> > 
> > an R fortunes candidate?
> > 
> > -- Bert Gunter
> > 
> > 
> > On Tue, 5 Dec 2006, Oliver Czoske wrote:
> > 
> > > On Mon, 4 Dec 2006, Uwe Ligges wrote:
> > >> Sebastian Spaeth wrote:
> > >>> Hi all,
> > >>> I have a list with a numerical column "cum_hardreuses". By coincidence
> I
> > >>> discovered this:
> > >>>
> > >>>> max(libs[,"cum_hardreuses"])
> > >>> [1] 1793
> > >>>
> > >>>> summary(libs[,"cum_hardreuses"])
> > >>>     Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> > >>>        1       2       4      36      14    1790
> > >>>
> > >>> (note the max value of 1790) Ouch this is bad! Anything I can do to
> > remedy
> > >>> this? Known bug?
> > >>
> > >> No, it's a feature! See ?summary: printing is done up to 3 significant
> > >> digits by default.
> > >
> > > Unfortunately, '1790' is printed with *four* significant digits, not
> > > three. The correct representation with three significant digits would
> have
> > > to employ scientific notation, 1.79e3.
> > >
> > >

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.




More information about the R-help mailing list