[R] Impaired boxplot functionality - mean instead of median

Martin Maechler maechler at stat.math.ethz.ch
Fri Dec 2 08:36:02 CET 2005


  {diverted back to R-help}

There are several R packages that provide plots of
"mean +/- SD" (or "mean +/- 2*SD" which is an approximate 95%
confidence interval for the case of normally distributed data)
or so called "error bars".

E.g. function  plotCI() in package 'gplots' and errbar() in
package 'Hmisc' or 'sfsmisc'.

I'm very convinced that boxplots shouldn't be (mis!)used for
drawing those (and they are not by the above functions).

Regards,
Martin 

>>>>> "Evgeniy" == Evgeniy Kachalin <ka4alin at yandex.ru>
>>>>>     on Thu, 01 Dec 2005 19:39:18 +0300 writes:

    Evgeniy> Martin Maechler ïèøåò:
    >> Boxplots were invented by John W. Tukey and I think should be
    >> counted among the top "small but smart" achievements from the
    >> 20th century.  Very wisely he did *not* use mean and standard deviations.
    >> 
    >> Even though it's possible to draw boxplots that are not boxplots
    >> (and people only recently explained how to do this with R on this
    >> mailing list), I'm arguing very strongly against this.
    >> 
    >> If I see a boxplot - I'd want it to be a boxplot and not have
    >> the silly (please excuse)  10%--------90% whiskers  which
    >> declare 20% of the points as outliers {in the boxplot sense}.
    >> 
    >> If you want the mean +/- sd plot, do *not* misuse boxplots
    >> for them, please! 
    >> 

    Evgeniy> So I analize genetics data. I have some factor
    Evgeniy> (gene variant, c(1,2,3)) and the quantitative
    Evgeniy> variable corresponding to that factor. How do I
    Evgeniy> visualize this situation? Compare mean of samples
    Evgeniy> corresponding to factor values?

    Evgeniy> Should boxplot support 'mean-in-the-middle', it
    Evgeniy> would fit my needs ideally. How do I plot mean +/-
    Evgeniy> SD plot?

    Evgeniy> Also there is a way to rewrite boxplot.stats and
    Evgeniy> replace "fivenum" there for self-made
    Evgeniy> function. Then I would need to write self-made
    Evgeniy> boxplot.formula (or boxplot.default?) function. And
    Evgeniy> all this stuff would not be configurable. I'm still
    Evgeniy> novice in R, so I need simple way to pre-visualize
    Evgeniy> my data and estimate approximate result.

yes, there are ways, but no, I pretty strongly oppose the idea
to misuse the boxplot graphics for depicting very different identities.




More information about the R-help mailing list