[R] Impaired boxplot functionality - mean instead of median

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Dec 1 23:05:10 CET 2005

Evgeniy Kachalin wrote:
> Marc Schwartz (via MN) ЃпЃиЃшЃеЃт:
>>>Marc Schwartz (via MN) ЃпЃиЃшЃеЃт:
>>>So plotmeans is incapable of: boxplot(numerical~fact1+fact2). Is there 
>>>any way further?
>>I think that somehow we are talking past each other here.
>>plotmeans() does what it is designed to do, which is to simplify the
>>process of plotting group-wise point estimates and user defined error
>>bars/intervals around the point estimates.
>>In your case, these intervals would be standard deviations around each
>>of the group means as you have indicated.
>>Review the examples in ?plotmeans.
>>As Martin and others have pointed out, you need to remove boxplots from
>>the equation here, as they were not designed to plot means and standard
> Again, what I'm talking about: plotmeans is incapable of analyzing the
> formula. For example, I have two factors: A - a, b, c, and B - d, e, f.
> If i plot: boxplot(num~A+B) what do I get? Eight boxes: ad, ae, af, ba,
> be, bf, cd, ce, cf. If I plot: plotmeans(num~A+B) - what do I get?
> Nothing. Because plotmeans cannot combine two factors in various
> combination. Is there a simple way to do it?
> Anyway... That's wrong way, all what is neccessary is to have a boxplot
> with mean istead of median. Is there simple way to do it?
> Statistical software like Statistica 7.0 offers any possible combination
> of what "Boxplot" could mean. Is it possible to have only one
> modification to R's boxplot?
> Thank you for kind answers.
> Also please tell me, where should I send replies: to conference adress
> or to those who answer me directly.


bwplot(...., panel=panel.bpplot)

By default, panel.bpplot shows the mean (dot) and median (line) plus 
several quantiles.  To bother Martin in a friendly way, I think that 
means  can be useful additions - not that they are so useful by 
themselves, but that when they differ a lot from the median, 
non-statisticians gain further information about asymmetry.  Also, even 
though the simple box plot is elegant, I sometimes think it has a high 
ink to information ratio.  I have gained a lot from seeing outer 
quantiles on the plot, and I don't like to show outer points for fear of 
someone labeling them outliers.  For describing raw data distributions, 
I never find standard deviations useful, however.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list