[Rd] Small inconsistency with boxplot

Martin Maechler maechler at stat.math.ethz.ch
Fri Nov 18 09:44:36 CET 2011


> Dear R-core team,
> I think I found a small inconsistency in the boxplot function. I don't want to post it as a bug since I'm not sure this might be considered as one according to the FAQ --- and this is not a major problem. Don't hesitate to tell me if I'm wrong.

> If you try to do a boxplot on a matrix and set the "at" argument to some vector different from 1:n, n is the number of columns of your matrix, then some boxplots will be hidden since the default "xlim" value will be set to c(0.5, n + 0.5) during the call of the bxp function.

> Currently you can easily bypass this problem by setting "xlim" appropriately when calling the boxplot function.

Yes.  And the help page for  bxp  even has the following note:

 \note{
   if \code{add = FALSE}, the default is \code{xlim = c(0.5, n +0.5)}.
   It will usually be a good idea to specify the latter if the "x" axis
   has a log scale or \code{at} is specified or \code{width} is far from
   uniform.
 }

which clearly documents the current behavior.
(and one could say also ``excuses'' the current behavior)

In this sense, there's really no bug ... ;-) and you were 
very wise (or at least cautious :-) *not* to post it as bug  .. 

> I think it will be better if all boxplots were always shown unless the "xlim" argument is specified. (I realized this behavior when I tried to do boxplots on conditional simulations of a stochastic process ; in which case the suggested behavior might be useful.)

I do agree that such a change would be more ``logical'' i.e.,
according to  "The Rule of Least Surprise"
(a good software design principle of providing a default behavior
 of "least surprise" to the user).

> Here's an example

> par(mfrow = c(1, 3))
> data <- matrix(rnorm(10 * 50), 50)
> colnames(data) <- letters[1:10]
> x.pos <- seq(-10, 10, length = 10)
> boxplot(data, at = x.pos) ## only the last 5 boxplots will appear
> boxplot(data, at = 1:10) ## all boxplots will appear
> boxplot(data, at = x.pos, xlim = range(x.pos) + c(-0.5, 0.5)) ## all boxplots will be shown


> I tried to do a patch if you want to change the current behavior --- note this is my first patch ever so maybe I'm doing it wrong.

it looks good.
In the end, I would use

	xlim <- range(at, finite=TRUE) + c(-0.5, 0.5)

There's one ***BIG*** question though:  

How probable is it that it breaks someone else's code.
Note that boxplot() and bxp() are  *REALLY*  old traditional S
functions
(and for all the young guys:  Boxplots where invented/proposed
 by the famous  John W Tukey, co-inventor of the FFT, the word
 "bit"; "exploratory data analysis", etc etc.
 Then (partly) at Bell Labs, who via John Chambers and
 co-workers also "donated" the S language and hence R to the world !)

and therefore you can expect many many uses of boxplot() in
other code...
and hence, it could well be that some code has (probably
implicitly) *relied* on the current "more surprising" behavior.

I'd still advocate to the change the default here,
but we really have to discuss this, as a change also may have
adverse consequences.

Martin Maechler, ETH Zurich (and R Core)

> *** Downloads/R-2.14.0/src/library/graphics/R/boxplot.R	Mon Oct  3 00:02:21 2011
> --- boxplot.R	Thu Nov 17 23:02:45 2011
> ***************
> *** 203,209 ****
>       }
  
>       if(is.null(pars$xlim))
> !         xlim <- c(0.5, n + 0.5)
>       else {
>   	xlim <- pars$xlim
>   	pars$xlim <- NULL
> --- 203,209 ----
>       }
  
>       if(is.null(pars$xlim))
> !         xlim <- c(min(at) - 0.5, max(at) + 0.5)
>       else {
>   	xlim <- pars$xlim
>   	pars$xlim <- NULL



More information about the R-devel mailing list