[R] Ordering categories on a boxplot - a serious trap??

William Dunlap wdunlap at tibco.com
Fri Feb 26 01:13:17 CET 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Schwab,Wilhelm K
> Sent: Thursday, February 25, 2010 3:51 PM
> To: r-help at r-project.org
> Subject: [R] Ordering categories on a boxplot - a serious trap??
> 
> Hello all,
> 
> I think I probably did something stupid, and R's part was to 
> allow me to do it.  My goal was to control the order of 
> factor levels appearing horizontally on a boxplot.  Enter 
> search engines and perhaps some creative stupidity on my 
> part, and I came up with the following:
> 
> 	v=read.table("factor-order.txt",header=TRUE);
> 	levels(v$doseGroup) = c("L", "M", "H");
> 	boxplot(v$dose~v$doseGroup);

levels<- translated the current level labels into
another language, it did not change the integer
codes of the factor.  If you want to reorder the
levels call factor(..., levels=).  E.g.,

  > z <- factor(c("Small","Large","Medium","Small"))
  > str(z)
   Factor w/ 3 levels "Large","Medium",..: 3 1 2 3
  > str(factor(z, levels=c("Small","Medium","Large")))
   Factor w/ 3 levels "Small","Medium",..: 1 3 2 1

You can relabel them also by using the labels= argument
to factor
  > str(factor(z, levels=c("Small","Medium","Large"),
labels=c("S","M","L")))
 Factor w/ 3 levels "S","M","L": 1 3 2 1

Calling levels<- changes nothing but the level labels:
  > zcopy <- z
  > levels(zcopy) <- c("Small","Medium","Large")
  > str(zcopy)
    Factor w/ 3 levels "Small","Medium",..: 3 1 2 3

levels<- is handy for low-level manipulations but not
for general use.  Even factor(,levels=) can be a bit
dangerous: if a new level is misspelled it will silently
add NA's to the data:
  > str(factor(z, levels=c("Smal", "Medium", "Large")))
   Factor w/ 3 levels "Smal","Medium",..: NA 3 2 NA

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> 
> A good way to see the trap is to evaluate:
> 
> 	v=read.table("factor-order.txt",header=TRUE);
> 	par(mfrow=c(2,1));
> 	boxplot(v$dose~v$doseGroup);
> 	levels(v$doseGroup) = c("L", "M", "H");
> 	boxplot(v$dose~v$doseGroup);
> 	par(mfrow=c(1,1));
> 
> The above creates two plots, one correct with the factors in 
> an inconvient order, and one that is WRONG.  In the latter, 
> the labels appear in the desired order, but the data does not 
> "move with them."  I did not discover the problem until I 
> repeated the same type of plot with something that had a 
> known relationship with the levels, and the result was 
> clearly not correct.
> 
> I *think* the problem is to assign to the return value of 
> levels().  How did I think to do that?  I'm not really sure, 
> but please look at
> 
>   https://stat.ethz.ch/pipermail/r-help/2008-August/171884.html
> 
> 
> Perhaps it does not say to do exactly what I did, but it sure 
> was easy to follow to the mistake, it appeared to do what I 
> wanted, and the consequences of the mistake are ugly.  
> Perhaps levels() should return something that is immutable??  
> If I am looking at this correctly, levels() is an accident 
> waiting to happen.
> 
> What should I have done?  It seems:
> 
> 	read data and order factor levels
> 	v=read.table("factor-order.txt",header=TRUE);
> 	group = factor(v$doseGroup,levels = c("L", "M", "H") );
> 	boxplot(v$dose~group);
> 
> 
> One disappointment is that the above factor() call apparently 
> needs to be repeated for any subset of v - I'm still trying 
> to get my mind around that one.
> 
> Can anyone confirm this?  It strikes me as a trap that should 
> be addressed so that an error results rather than a garbage graph.
> 
> Bill
> 
> 
> ---
> Wilhelm K. Schwab, Ph.D.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list