[Rd] Dropping unused levels of a factor that has "NA" as a level

Brahm, David David.Brahm at geodecapital.com
Wed Jul 12 00:19:40 CEST 2006


I mentioned this in R-help on April 28:
<https://stat.ethz.ch/pipermail/r-help/2006-April/104595.html>

| as.character.factor contains this line (where cx=levels(x)[x]):
|   if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>"
|
| Is it possible that this is no longer the desired behavior?  These
| two results don't seem very consistent:
|
| > as.character(as.factor(c("AB", "CD", NA)))
| [1] "AB" "CD" NA  
| > is.na(.Last.value)[3]
| [1] TRUE
|
| > as.character(as.factor(c("NA", "CD", NA)))
| [1] "NA"   "CD"   "<NA>"
| > is.na(.Last.value)[3]
| [1] FALSE
|
| I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior
| is new (maybe since character NA's were introduced?).
|
| -- David Brahm (brahm at alum.mit.edu)


-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
Sent: Tuesday, July 11, 2006 5:59 PM
To: J. Hosking
Cc: r-devel at stat.math.ethz.ch
Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level

"J. Hosking" <jh910 at juno.com> writes:

> Is this a bug?
> 
>    > f1 <- factor(c("a", NA), levels = c("a", "NA") )
>    > f2 <- f1[, drop = TRUE]
>    > f2
>    [1] a    <NA>
>    Levels: a <NA>
> 
> I would have expected f2 to have only one level, "a".  It seems
> to me that the code in [.factor does not follow the advice in
> help("factor") on how to set factor codes to be missing when
> "NA" is a level of the factor.


Something odd is going on, that's for sure...

The problem is also there with factor(f1). And the logic in
as.character.factor seems to be at the root of it:

> as.character.factor
function (x, ...)
{
    cx <- levels(x)[x]
    if ("NA" %in% levels(x))
        cx[is.na(x)] <- "<NA>"
    cx
}
 
This looks like something from before we had character NA values. I
wonder if it is a mistake or there could actually be a reason to
keep it. 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list