[Rd] box and whisker (PR#13821)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sun Jul 12 11:11:37 CEST 2009

m.crawley at imperial.ac.uk wrote:
> In a Box and Whisker plot, I thought that when there are outliers both abov=
> e and below the whiskers, then the whiskers should both be the same length =
> (plus or minus 1.5 times the inter-quartile range).

Not according to the docs:

    range: this determines how far the plot whiskers extend out from the
           box.  If 'range' is positive, the whiskers extend to the most
           extreme data point which is no more than 'range' times the
           interquartile range from the box. A value of zero causes the
           whiskers to extend to the data extremes.

And the code itself has

             stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)

So the whisker won't be equal to 1.5 IQR unless there happens to be an 
observation there.

Now, this might be wrong, but people have tried very hard to make the 
implementation follow the original definition due to Tukey. I.e., if you 
can point out that Tukey specified it otherwise, then we'd change it, 
otherwise it is just not a bug.

> If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
> see that for November (month =3D 11) the upper whisker is shorter than the =
> lower, while for other months with outliers both above and below, the lines=
>  are the same lengths.

For easier reproduction (reproducible examples should not refer to files 
on your C: drive...):

 > diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
[1,] 1.2525857
[2,] 0.5412128
[3,] 0.6083348
[4,] 1.4625057

    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

More information about the R-devel mailing list