[Rd] Bug in tapply with factors containing NAs (PR#6672)
    Peter Dalgaard 
    p.dalgaard at biostat.ku.dk
       
    Mon Mar 15 12:20:22 MET 2004
    
    
  
george.leigh at dpi.qld.gov.au writes:
> Full_Name: George Leigh
> Version: 1.8.1
> OS: Windows 2000
> Submission from: (NULL) (203.25.1.208)
> 
> 
> The following example gives the correct answer when the first argument of tapply
> is a numeric vector, but an incorrect answer when it is a factor.  If the
> function used by tapply is "length", the type and contents of the first argument
> should make no difference, provided it has the same length as the second
> argument.
> 
> > x = c(NA, 1)
> > y = factor(x)
> > tapply(x, y, length)
> 1 
> 1 
> > tapply(y, y, length)
> 1 
> 2 
> >
The core of this is that
> split(y,y)
$"1"
[1] <NA> 1
Levels: 1
> split(x,y)
$"1"
[1] 1
which in turn comes from the innards of split.default:
...
    if (is.null(attr(x, "class")) && is.null(names(x)))
        return(.Internal(split(x, f)))
    lf <- levels(f)
    y <- vector("list", length(lf))
    names(y) <- lf
    for (k in lf) y[[k]] <- x[f == k]
    y
Factors have a class attribute, so you don't use the internal code in
that case and
> y[y=="1"]
[1] <NA> 1
Levels: 1 
I think the line in split.default  needs to read
    for (k in lf) y[[k]] <- x[!is.na(f) & f == k]
-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
    
    
More information about the R-devel
mailing list