[R] Repeated factor levels - inconsistency of factor and levels<- functions?

Honza Hucin honza at ifolk.cz
Thu Sep 25 20:17:44 CEST 2008


>> I have a vector x containing letters ("a", "b" etc.). Now I want to
>> convert it to factor and group some letters into one common level. If I do
>> it by factor function, giving the same label names for all values I want
>> to group, it doesn't work:
>>
>>   
>>     
>>> x<-letters[1:5]
>>> x
>>>     
>>>       
>> [1] "a" "b" "c" "d" "e"
>>   
>>     
>>> f<-factor(x,levels=letters[1:5],
>>>     
>>>       
>>          labels=c("vowel","consonant","consonant","consonant","vowel"))
>>   
>>     
>>> levels(f)
>>>     
>>>       
>> [1] "vowel"     "consonant" "consonant" "consonant" "vowel"
>>
>> But, after it, if I update level names by a single assignment, levels with
>> the same names will group, even when I don't change all of them:
>>
>>   
>>     
>>> levels(f)[1]<-"vowel" #changing only one vector item will make ALL
>>>     
>>>       
>> levels to group
>>   
>>     
>>> levels(f)
>>>     
>>>       
>> [1] "vowel"     "consonant"
>>
>> I'm rather confused! I think this behavior is double inconsistent. First,
>> the labeling in factor function should work similarly as in levels<- ,
>> i.e. they should group levels with the same names either BOTH or NONE.
>> Second, if I change only one vector item, it should not change anything
>> else, especially it should not make any "invisible" grouping.
>>
>> Or am I wrong? Or is it a bug?
>>
>>   
>>     
> I asked Brian Ripley the same thing half a year ago and his answer was:
> "Back compatibility ...."
>
> I'm at a loss trying to figure out what kind of code would depend on
> current behaviour, but the workaround is rather obvious, so the
> motivation for fixing (changing!) it is not too great.
>
>
>   
Thank you for the explanation, I understand. Maybe the fix would be 
simple - by providing the factor function with a new parameter, say, 
"unique.levels" or "group.levels" with default value of FALSE. This way 
the back compatibility would be preserved. I should rather send it to 
R-devel mailing list, shoudn't I? :)
Jan Hucin



More information about the R-help mailing list