[R] NA, where no NA should (could!) be!

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Dec 20 23:28:38 CET 2008


Oliver Bandel wrote:
> Sarah Goslee <sarah.goslee <at> gmail.com> writes:
> 
>> I think we need the reproducible example requested in
>> the posting guide.
> 
> ====================
> for ( datum in names(weblog_by_date) )
> { 
>   print(datum)
>   selected <- weblog_by_date[[datum]]
> 
>   res_size_by_host <- tapply( selected$size, selected$host, sum) 
>   mycat <- function(a,b) cat(paste(a, "==>", b, "\n"))
>   mapply( mycat, selected$size, selected$host )
>   print( res_size_by_host )
> 
>   print( "is there any NA?!")
>   print( any( is.na(selected$size)) )
> 
> }
> ====================

Why do so many people have such trouble with the word "reproducible"? We 
can't reproduce that without access to weblog_by_date!

Anyways I think it is tapply that is behaving unexpectedly to you:

 > x <- factor(1,levels=1:2)
 > tapply(1,x,sum)
  1  2
  1 NA

which is kind of surprising since the sum over an empty set is usually 
zero. However, that _is_ what the documentation for tapply says:

      When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has
      any data in it.  If 'FUN' returns a single atomic value for each
      such cell (e.g., functions 'mean' or 'var') and when 'simplify' is
      'TRUE', 'tapply' returns a multi-way array containing the values,
      and 'NA' for the empty cells.

a passable workaround is

 > sapply(split(1,x),sum)
1 2
1 0



> 
> 
> 
> At the end of the printouts, it gives me:
> 
> =======================
>  94.101.145.110     94.23.3.220 
>              NA              NA 
> [1] "is there any NA?!"
> [1] FALSE
> =======================
> 

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list