[R] Thougt I understood factors but??

David Winsemius dwinsemius at comcast.net
Mon Mar 1 20:04:48 CET 2010


On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote:

> Hi,
> consider the following
>> a<-gl(3,3,9)
>> a
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 1 2 3
>> levels(a)<-3:1

That may look like the same re-ordered factor but you instead merely  
re-labeled each level where the internal numbers that represent the  
factor values stayed the same..

>> a
> [1] 3 3 3 2 2 2 1 1 1
> Levels: 3 2 1
>> a<-gl(3,3,9)
>> factor(a,levels=3:1)

That is the right way IMO to safely change the ordering of the levels  
without changing the "semantics" or the "meaning" of the factor level  
assignments.

Try:

levels(a) <- letters[4:6]
a

[1] d d d e e e f f f
Levels: d e f
 > a <- factor(a, levels=letters[1:3])
 > a
[1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Levels: a b c

Using the second form sets any non-existent (in the new level vector)  
factor values to NA's, in this case all of them. It is better in my  
mind to get assignments to NA than it would be to get assignments to  
incorrect levels.

 > b <-factor(c(0,0,0,0, 1, 1))
 > b
[1] 0 0 0 0 1 1
Levels: 0 1
 > levels(b) <-c(1,0)
 > b
[1] 1 1 1 1 0 0   # No longer the same "meaning"
Levels: 1 0
 > b <-factor(c(0,0,0,0, 1, 1))
 > b<- factor(b, levels=c(1,0))
 > b
[1] 0 0 0 0 1 1
Levels: 1 0      # Only the ordering has changed but the meaning is  
the same


This is especially so when working with factors as components of  
data.frames.


-- 
David.



> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> It is probably something obvious I missed, but reading the  
> documentation
> of factor, and levels I would have thought
> that both should produce the same output as
> factor(a,levels=3:1)
> [1] 1 1 1 2 2 2 3 3 3
> Levels: 3 2 1
> The closest I could find in a quick search was this
> http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2503.html
>
> Thanks
> Nicholas
>
> sessionInfo()
> R version 2.10.1 Patched (2009-12-20 r50794)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] splines   tcltk     stats     graphics  grDevices utils      
> datasets
> [8] methods   base
>
> other attached packages:
> [1] mvtnorm_0.9-9      latticeExtra_0.6-9 RColorBrewer_1.0-2
> lattice_0.18-3
> [5] nlme_3.1-96        XML_2.6-0          gsubfn_0.5-0        
> proto_0.3-8
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1  tools_2.10.1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list