[R] help understanding box plots

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Fri Feb 22 11:52:54 CET 2002


Jay Pfaffman <pfaffman at relaxpc.com> writes:

> Another naive stats question.  I'm trying to better understand what
> boxplots are telling me.  
> 
> I think what I see is the median and the boundaries of the 1st and 3rd
> quartiles.  The whiskers represent the range of the data unless there
> are points which are outside "range" (default: 1.5) times the distance
> from the median to that quartile.  Is that right? 

Not quite. 1.5 times the length of the entire box.

> I've read the
> documentation for boxplot numerous times, but don't quite understand
> it well enough to communicate it to my professor who's helping me with
> this project.  (You'll be relieved to know that neither of us fancies
> ourself a statistician!)

boxplot.stats.Rd had a typo and got updated recently in the
development and patch versions to read

  \item{coef}{this determines how far the plot ``whiskers'' extend out
    from the box.  If \code{coef} is positive, the whiskers extend to
    the
    most extreme data point which is no more than \code{coef} times
    the length of the box away from the box. A value of zero causes
    the whiskers
    to extend to the data extremes (and no outliers be returned).}

(for some reason this hasn't yet found its way to the online snapshot
manuals in http://stat.ethz.ch/R-alpha/R-devel/doc/html/ and friends.
Martin?)


> V&R (p. 122) claims that the hinges are "roughly quartiles," so
> perhaps my naive understanding is close enough.

Yes. The exact definition is slightly peculiar, but in compliance with
the original definition by Tukey. So I'm told, anyway.


> I've got a relatively small data set (n~=12).  I think it would help
> to see the data points plotted on top of the boxplots.  Here's what
> I'm doing now:
> 
>     par(las=2,ps=14,mar=c(15, 4, 4, 2))
>     boxplot(split(ranks,c(1:25)), names=items, notch=T, horizontal=F, add=F)
> 
> If I could get the points of each of the 25 variables plotted on top
> of the box, that'd be great.

Not sure what you're doing there, but maybe some code like this could
help:

 x1<-rnorm(20)
 x2<-rnorm(20)
 boxplot(list(x1=x1,x2=x2))
 points(cbind(1,x1))
 points(cbind(2,x2))


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list