[R] Impaired boxplot functionality - mean instead of median

Petr Pikal petr.pikal at precheza.cz
Fri Dec 2 09:37:35 CET 2005


Hi

I totally agree with Martin because when I see boxplot I immediately 
expect median in the middle and all other parts defined accordingly.

It is possible to use

bp <- boxplot(..., plot=F)

and then to change the median values in bp to means and IQRs to SD 
and everything to anything else but this raise immediatelly the issue 
of

"Lies, damned lies and statistics"

Just my 2 cents.

Petr


On 2 Dec 2005 at 8:36, Martin Maechler wrote:

From:           	Martin Maechler <maechler at stat.math.ethz.ch>
Date sent:      	Fri, 2 Dec 2005 08:36:02 +0100
To:             	Evgeniy Kachalin <ka4alin at yandex.ru>
Copies to:      	R-help at stat.math.ethz.ch
Subject:        	Re: [R] Impaired boxplot functionality - mean instead of median
Send reply to:  	Martin Maechler <maechler at stat.math.ethz.ch>
	<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
	<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>

>   {diverted back to R-help}
> 
> There are several R packages that provide plots of
> "mean +/- SD" (or "mean +/- 2*SD" which is an approximate 95%
> confidence interval for the case of normally distributed data)
> or so called "error bars".
> 
> E.g. function  plotCI() in package 'gplots' and errbar() in
> package 'Hmisc' or 'sfsmisc'.
> 
> I'm very convinced that boxplots shouldn't be (mis!)used for
> drawing those (and they are not by the above functions).
> 
> Regards,
> Martin 
> 
> >>>>> "Evgeniy" == Evgeniy Kachalin <ka4alin at yandex.ru>
> >>>>>     on Thu, 01 Dec 2005 19:39:18 +0300 writes:
> 
>     Evgeniy> Martin Maechler ďčřĺň:
>     >> Boxplots were invented by John W. Tukey and I think should be
>     >> counted among the top "small but smart" achievements from the
>     >> 20th century.  Very wisely he did *not* use mean and standard
>     deviations. >> >> Even though it's possible to draw boxplots that
>     are not boxplots >> (and people only recently explained how to do
>     this with R on this >> mailing list), I'm arguing very strongly
>     against this. >> >> If I see a boxplot - I'd want it to be a
>     boxplot and not have >> the silly (please excuse)  10%--------90%
>     whiskers  which >> declare 20% of the points as outliers {in the
>     boxplot sense}. >> >> If you want the mean +/- sd plot, do *not*
>     misuse boxplots >> for them, please! >> 
> 
>     Evgeniy> So I analize genetics data. I have some factor
>     Evgeniy> (gene variant, c(1,2,3)) and the quantitative
>     Evgeniy> variable corresponding to that factor. How do I
>     Evgeniy> visualize this situation? Compare mean of samples
>     Evgeniy> corresponding to factor values?
> 
>     Evgeniy> Should boxplot support 'mean-in-the-middle', it
>     Evgeniy> would fit my needs ideally. How do I plot mean +/-
>     Evgeniy> SD plot?
> 
>     Evgeniy> Also there is a way to rewrite boxplot.stats and
>     Evgeniy> replace "fivenum" there for self-made
>     Evgeniy> function. Then I would need to write self-made
>     Evgeniy> boxplot.formula (or boxplot.default?) function. And
>     Evgeniy> all this stuff would not be configurable. I'm still
>     Evgeniy> novice in R, so I need simple way to pre-visualize
>     Evgeniy> my data and estimate approximate result.
> 
> yes, there are ways, but no, I pretty strongly oppose the idea
> to misuse the boxplot graphics for depicting very different
> identities.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list