[R] Finicky factor comparison operators

David Winsemius dwinsemius at comcast.net
Mon Feb 20 14:57:07 CET 2012


On Feb 20, 2012, at 1:45 AM, johnmark wrote:

> MIchael -
>
> Thanks for your insight.  I think I see where you're going with this.
>
> To make '==' comparisons for subsetting against an ordered factor,  
> I've had
> to create a lookup table for all possible values I'd ever want to  
> compare
> against (all dates covered by the quarters in question, in this  
> case) that
> maps into the ordered factors values.  This is wrapped by a function  
> that
> returns an ordered factor, which allows me to write:
>
> /(opps$close_quarter == which.quarter.end("2010-10-20")/
>
> Otherwise if I try to create an ordered factor from the constant  
> just for
> the purposes of comparison, the error tells me that ordered factors  
> from
> different sources cannot be compared:
>
> /(opps$close_quarter == factor("2007-10-20", ordered=T)
> Error in Ops.factor(factor("2007-10-30", ordered = T),  
> quarter.factors[1,
> 2]) :
>  level sets of factors are different/

Actually it is telling you that you cannot compare ordered factors  
which have different levels. That makes perfect sense for the same  
reasons that you are not allowed to compare Dates to ordered factors.  
If the factors from different sources had the same levels you should  
have succeeded.

 > z <- factor(LETTERS[3:1], ordered = TRUE)
 > z3 <- factor(LETTERS[1:3] , ordered=TRUE)
 > z[2] == z3[2]
[1] TRUE


>
> That makes sense, since internally factors are integers -- "enums"  
> in other
> terms.
>
> But what I want to avoid -- and what I don't see as necessary is  
> explicitly
> coercing the terms to a common representation that mimics their  
> print form:
>
> /as.character("2007-10-20")== as.character(factor("2007-10-20",  
> ordered=T))
> /
> I don't think there should be confusion since the conversion to  
> print form
> is "obvious" -- but it does conflict with the conversion rules for  
> creating
> vectors by c():
>
> /c("2011-10-20", factor("2007-10-20", ordered=T))
> [1] "2011-10-20" "1" /
>
> where the factor is converted to its internal "enum" representation,  
> then to
> a character.

That just an example of the need to use as.character when converting  
data out of factor class.

>
> Having given this some more thought to what motivated the original  
> question,
> one could use "which()" to invert the factor's levels vector:
>
> /which("2008-04-30" == levels(quarter.factors[,2]))
> [1] 3 /
>
> Its still not clear to me what exactly are the implicit conversion  
> rules for
> factors.

In your last case you are comparing a character to a character value  
and getting the expected result. (Since levels(quarter.factors) is NOT  
a factor.)  You should also succeed when testing equality between  
ordered factor and character types. You have still not provided an  
example for testing so this may suffice.

 > z <- factor(LETTERS[3:1], ordered = TRUE)
 > z == "A"
[1] FALSE FALSE  TRUE

You should be able to assemble a list of valid candidate (character)  
values with levels(fac). Or if you want them in factor representation  
then use unique(fac).


-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list