[R] persistance of factor levels in a data frame

Marc Schwartz MSchwartz at MedAnalytics.com
Mon Feb 28 14:40:34 CET 2005


On Mon, 2005-02-28 at 14:07 +0100, Lefebure Tristan wrote:
> Hi,
> Just something I don't understand:
> 
> data <- data.frame(V1=c(1:12),F1=c(rep("a",4),rep("b",4),rep("c",4)))
> data_ac <- data[which(data$F1 !="b"), ]  
> levels(data_ac$F1)    
> 
> Why the level "b" is always present ?
> 
> thanks
> 
> Tristan, R 2.0.1 for Linux Fedora 3

See ?"[.factor" for details. You will note that the argument 'drop' is
FALSE by default, which means that unused levels of a factor are not
dropped when subsetting.

This can be important if you might want to join or compare factors from
more than one source, where you want to ensure that the factor levels
are the same. If you were to drop the unused levels in one factor, but
it is present in the other, the comparison would be problematic, since
the levels for the same values in the two factors would be different.

If you want to force the unused levels to be dropped before using a
factor, just use:

> data_ac$F1 <- factor(data_ac$F1)

> data_ac$F1
[1] a a a a c c c c
Levels: a c

See ?factor for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list