[Rd] Dropping unused levels of a factor that has "NA" as a level

Peter Dalgaard p.dalgaard at biostat.ku.dk
Tue Jul 11 23:58:51 CEST 2006


"J. Hosking" <jh910 at juno.com> writes:

> Is this a bug?
> 
>    > f1 <- factor(c("a", NA), levels = c("a", "NA") )
>    > f2 <- f1[, drop = TRUE]
>    > f2
>    [1] a    <NA>
>    Levels: a <NA>
> 
> I would have expected f2 to have only one level, "a".  It seems
> to me that the code in [.factor does not follow the advice in
> help("factor") on how to set factor codes to be missing when
> "NA" is a level of the factor.


Something odd is going on, that's for sure...

The problem is also there with factor(f1). And the logic in
as.character.factor seems to be at the root of it:

> as.character.factor
function (x, ...)
{
    cx <- levels(x)[x]
    if ("NA" %in% levels(x))
        cx[is.na(x)] <- "<NA>"
    cx
}
 
This looks like something from before we had character NA values. I
wonder if it is a mistake or there could actually be a reason to
keep it. 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-devel mailing list