[Rd] boxplot, notches, etc.

Ben Bolker bolker at zoo.ufl.edu
Tue Oct 10 00:17:57 CEST 2006


  Sorry to repost this, but it looks like it's getting
buried in r-help (originally posted October 5: my experience
says that if it hasn't been answered by then it won't be).
I wouldn't bother, but I'm worried that r-devel might be
better, *and* a previous e-mail of mine on the subject in
January also seemed to get buried.

  Synopsis: boxplot notches look weird when notches
are greater than hinges ((1.58*IQR/sqrt(n)) > approx IQR).
When log="y" this causes an error.  Below are several
reproducible examples, some discussion, and a patch against
calc.R.

  Please feel free to say "this is just cosmetic/isn't an issue, go
away" ...

  cheers
    Ben Bolker

bogdan romocea <br44114 <at> gmail.com> writes:

>
> A function I've been using for a while returned a surprising [to me,
> given the data] error recently:
>    Error in plot.window(xlim, ylim, log, asp, ...) :
>        Logarithmic axis must have positive limits
>
> After some digging I realized what was going on:
> x <- c(10460.97, 10808.67, 29499.98, 1, 35818.62, 48535.59, 1, 1,
>    42512.1, 1627.39, 1, 7571.06, 21479.69, 25, 1, 16143.85, 12736.96,
>    1, 7603.63, 1, 33155.24, 1, 1, 50, 3361.78, 1, 37781.84, 1, 1,
>    1, 46492.05, 22334.88, 1, 1)
> summary(x)
> boxplot(x,notch=TRUE,log="y")  #unexpected
> boxplot(x)  #ok
> boxplot(x,log="y")  #ok
> boxplot(x,notch=TRUE)  #aha
>

  Mick Crawley (author of several books on ecological
data analysis in R) submitted a related issue as
bug #7690, which I was mildly surprised to see
filed as "not reproducible" (I didn't have problems reproducing
it at the time ... I posted my then-patch
to R-devel at the time
https://stat.ethz.ch/pipermail/r-devel/2006-January/036257.html )
The problem typically occurs
for very small data sets, when the notches can
be bigger than the hinges.

  As I said then,

>  I can imagine debate about what should be done in this case --
> you could just say "don't do that", since the notches are based
> on an asymptotic argument ... the diff below just truncates
> the notches to the hinges, but produces a warning saying that the
> notches have been truncated.

The interaction with
log="y" is new to me, though, and my old patch
didn't catch it.

   Here's my reproducible version:

set.seed(1001)
npts <- 7
X <- rnorm(2*npts,rep(c(3,4.5),each=npts),sd=1)
f <- factor(rep(1:2,each=npts))
par(mfrow=c(1,2))
boxplot(X~f,notch=TRUE)

  A possible fix is to truncate the notches
(and issue a warning) when this happens,
in src/library/grDevices/R/calc.R:

[WATCH OUT FOR LINE WRAPPING BELOW!]

*** calc.R      2006-10-07 17:44:49.000000000 -0400
--- newcalc.R   2006-10-07 19:25:38.000000000 -0400
***************
*** 16,21 ****
--- 16,26 ----
        if(any(out[nna])) stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
      }
      conf <- if(do.conf) stats[3] + c(-1.58, 1.58) * iqr / sqrt(n)
+     if (do.conf) {
+       if (conf[1]<stats[2] || conf[2]>stats[4]) warning("confidence
limits > hinges: notches truncated")
+       conf[1] <- max(conf[1],stats[2])
+       conf[2] <- min(conf[2],stats[4])
+     }
      list(stats = stats, n = n, conf = conf,
         out = if(do.out) x[out & nna] else numeric(0))
  }

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://stat.ethz.ch/pipermail/r-devel/attachments/20061009/ca6413fe/attachment-0004.bin 


More information about the R-devel mailing list