[Rd] box and whisker (PR#13821)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Sat Jul 18 10:51:17 CEST 2009


>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>     on Sun, 12 Jul 2009 11:11:37 +0200 writes:

    PD> m.crawley at imperial.ac.uk wrote:
    >> In a Box and Whisker plot, I thought that when there are outliers both abov=
    >> e and below the whiskers, then the whiskers should both be the same length =
    >> (plus or minus 1.5 times the inter-quartile range).

    PD> Not according to the docs:

    PD> range: this determines how far the plot whiskers extend out from the
    PD> box.  If 'range' is positive, the whiskers extend to the most
    PD> extreme data point which is no more than 'range' times the
    PD> interquartile range from the box. A value of zero causes the
    PD> whiskers to extend to the data extremes.

    PD> And the code itself has

    PD> stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)

    PD> So the whisker won't be equal to 1.5 IQR unless there happens to be an 
    PD> observation there.

    PD> Now, this might be wrong, but people have tried very hard to make the 
    PD> implementation follow the original definition due to Tukey. I.e., if you 
    PD> can point out that Tukey specified it otherwise, then we'd change it, 
    PD> otherwise it is just not a bug.

I'd bet pretty large amounts that we (and S and S-plus probably
quite few otherpackages) have implemented the whiskers the way
JWT defined them, very purposefully.

One of JWT's point *was* exactly that most of the values "drawn"
represent *observations* (and those that do not use
exact mid points of obs.):
It's not by coincidence or even queerness that the box is *not*
delineated by the usual quartiles, but rather the *hinges*

[ Digression about hinges vs quartiles : 

   ?boxplot.stats

  has a section 'Details'  to which I had added such information about
  decade ago.
  Whereas our R help pages ( ?boxplot.stats,  ?fivenum ) 
  do use the correct definitions,
  unfortunately many other places do *not*, e.g., even the
  Wikipedia page  http://en.wikipedia.org/wiki/Five-number_summary
  wrongly talks about 1st and 3rd quartile,
  but then at least uses a numerical example using the hinges
]

Martin Maechler, ETH Zurich

    >> If you look at the plot for SilwoodWeather on p.155 of The R Book you will =
    >> see that for November (month =3D 11) the upper whisker is shorter than the =
    >> lower, while for other months with outliers both above and below, the lines=
    >> are the same lengths.

    PD> For easier reproduction (reproducible examples should not refer to files 
    PD> on your C: drive...):

    >> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
    PD> [,1]
    PD> [1,] 1.2525857
    PD> [2,] 0.5412128
    PD> [3,] 0.6083348
    PD> [4,] 1.4625057



    PD> -- 
    PD> O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
    PD> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
    PD> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
    PD> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

    PD> ______________________________________________
    PD> R-devel at r-project.org mailing list
    PD> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list