[R] boxplot notches

Christoph Scherber Christoph.Scherber at uni-jena.de
Tue Mar 2 11:18:19 CET 2004


Dear colleagues,

I think it would be a good idea to include a short note in the R 
boxplot() help file, stating exactly how the confidence levels are 
calculated
("the notches are +/- 1.58 IQR/sqrt(n)")  - at least as a guidance for 
users not advanced enough to directly interpret the code.

Would this be possible?

Regards,
Christoph.

David James wrote:

> Prof Brian Ripley wrote:
>
>> On Mon, 1 Mar 2004, Martin Maechler wrote:
>>
>>>>>>>> "TL" == Thomas Lumley <tlumley at u.washington.edu>
>>>>>>>> on Mon, 1 Mar 2004 09:54:48 -0800 (PST) writes:
>>>>>>>
>>> TL> On Mon, 1 Mar 2004, Christoph Scherber wrote:
>>> >> Dear list members,
>>> >>
>>> >> Can anyone tell me how the notches in boxplot(Y~X,notch=T) are
>>> >> calculated? What do these notches represent exactly? I´d suppose they
>>> >> are Conficence Intervals for the median, but I´ve also been told they
>>> >> might show Least Significant Difference (LSD) equivalents.
>>>
>>> TL> The help page says that
>>> TL> " If the notches of two plots do not overlap then
>>> TL> the medians are significantly different at the 5 percent level."
>>>
>>> TL> The only thing wrong with this is that it isn't true.
>>> TL> The code says that the notches are +/- 1.58 IQR/sqrt(n),
>>> TL> so I think the claimed confidence level holds only for
>>> TL> normal distribuitons with small amounts of contamination.
>>>
>>> I think John Tukey's idea was that this formula (or just the fact of
>>> using median and quartiles) is still often approximately correct
>>> for quite a few kinds of moderate contaminations...
>>
>> It may be approximately correct for the width of a CI (and when I 
>> checked
>> it was only appproximately correct for a normal), but I would seriously
>> doubt if it were approximately correct for a significance level of 5%.
>> Remember how fast the tails of the asymptotic normal distribution 
>> decay: a
>> 20% error turns 5% into 2%.
>>
>> BTW, if there is a precise reference for this it would be good to add it
>> to boxplot.stats.Rd, as the confidence limits are unexplained there.
>
>
> @article{McGi:Tuke:Lars:1978,
> author = {McGill, Robert and Tukey, John W. and Larsen, Wayne A.},
> title = {Variations of {B}ox plots},
> year = {1978},
> journal = {The American Statistician},
> volume = {32},
> pages = {12--16},
> keywords = {Exploratory data analysis; Graphics}
> }
>
> @book{Cham:Clev:Klei:Tuke:1983,
> author = {Chambers, John M. and Cleveland, William S. and Kleiner, Beat
> and Tukey, Paul A.},
> title = {Graphical methods for data analysis},
> year = {1983},
> pages = {395},
> publisher = {Wadsworth Publishing Co Inc}
> }
>
>> -- 
>> Brian D. Ripley, ripley at stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>
>




More information about the R-help mailing list