[Rd] Bug in tapply with factors containing NAs (PR#6672)

Prof Brian D Ripley ripley at stats.ox.ac.uk
Mon Mar 15 12:18:07 MET 2004


On Mon, 15 Mar 2004 george.leigh at dpi.qld.gov.au wrote:

> Full_Name: George Leigh
> Version: 1.8.1
> OS: Windows 2000
> Submission from: (NULL) (203.25.1.208)
>
>
> The following example gives the correct answer when the first argument of tapply
> is a numeric vector, but an incorrect answer when it is a factor.  If the
> function used by tapply is "length", the type and contents of the first argument
> should make no difference, provided it has the same length as the second
> argument.

Not so:

> split(x, y)
$"1"
[1] 1

> split(y, y)
$"1"
[1] <NA> 1
Levels: 1

Note that as there is only one level, NA must be 1 in y, whereas it does
not have to be in x.  So the answer for a factor in your problem is
definitely correct, if fortuitous.

R does the same as S in this example.

If there were more than one level in y, the issue is less clearcut.
Probably y[[k]] <- x[f == k] in split.default should be x[f %in% k]

Note too

z <- x; class(x) <- "foo"
> split(z, y)
$"1"
[1] NA  1


> x = c(NA, 1)
> > y = factor(x)
> > tapply(x, y, length)
> 1
> 1
> > tapply(y, y, length)
> 1
> 2

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list