[R] NA, where no NA should (could!) be!

Oliver Bandel oliver at first.in-berlin.de
Sat Dec 20 23:48:28 CET 2008

Zitat von Peter Dalgaard <p.dalgaard at biostat.ku.dk>:

> Oliver Bandel wrote:
> > Sarah Goslee <sarah.goslee <at> gmail.com> writes:
> >
> >> I think we need the reproducible example requested in
> >> the posting guide.
> >
> > ====================
> > for ( datum in names(weblog_by_date) )
> > {
> >   print(datum)
> >   selected <- weblog_by_date[[datum]]
> >
> >   res_size_by_host <- tapply( selected$size, selected$host, sum)
> >   mycat <- function(a,b) cat(paste(a, "==>", b, "\n"))
> >   mapply( mycat, selected$size, selected$host )
> >   print( res_size_by_host )
> >
> >   print( "is there any NA?!")
> >   print( any( is.na(selected$size)) )
> >
> > }
> > ====================
> Why do so many people have such trouble with the word "reproducible"?

To create test data I need more time, have to change the original
IP-adresses to fake adresses, before posting it here.
Also I doubt a *.zip file would be accepted, but this would
have been the next thing I wanted to try.

If it will not be possible to send binary attachements, then it will be
not possible to send testdata here, because the length of the lines in
the logfile are longer than what my current weblmailer allows me to
send without breaking the lines.

Also I hoped, that people know the traps, and can help by just looking
at the code and know, where to look for the problem.

As you now have shown, this is possible, because you knew were too look
for the problem, which shows me that you are very experienced in R.

> We
> can't reproduce that without access to weblog_by_date!

See above: problem of providing such data and needing time for creating

> Anyways I think it is tapply that is behaving unexpectedly to you:
>  > x <- factor(1,levels=1:2)
>  > tapply(1,x,sum)
>   1  2
>   1 NA
> which is kind of surprising since the sum over an empty set is
> usually
> zero. However, that _is_ what the documentation for tapply says:
>       When 'FUN' is present, 'tapply' calls 'FUN' for each cell that
> has
>       any data in it.  If 'FUN' returns a single atomic value for
> each
>       such cell (e.g., functions 'mean' or 'var') and when 'simplify'
> is
>       'TRUE', 'tapply' returns a multi-way array containing the
> values,
>       and 'NA' for the empty cells.
> a passable workaround is
>  > sapply(split(1,x),sum)
> 1 2
> 1 0

Thank you.

This loooks like the solution for that simple case.

I hope I can adapt it to my data structure.

The problem here is, that there are no empty cells
in my data. There is always a numeric value of
0 or greater, because I threw out any "NA" and
substituted it with 0.

The data is inside a data-frame.
How can there be an empty cell in a data-frame?
There are no NAs and no NANs...
...and the factors must be new each time,
because the data will be created newly,
and I also had used rm(selected) to be sure there are not
factors stored from the last access...

Did I overlooked something?


P.S.: I will try to attach my zip-file now... it contains
      the complete code and a changed weblog (changed IP-addresses).
      I hope the list accepts it.

More information about the R-help mailing list