[R] sum() with na.rm=TRUE, again
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Thu Apr 25 18:04:08 CEST 2002
On Thu, 25 Apr 2002, Richards, Tom wrote:
> Hi:
>
> I remember a post several days ago by Jon Baron, concerning the
> behavior of sum() when one sets na.rm=TRUE:
> the result will be a zero sum for a vector of all NA's, as here, for the
> second row:
>
> > ss<- data.frame(x=c(1,NA,3,4),y=c(2,NA,4,NA))
> > ss
> x y
> 1 1 2
> 2 NA NA
> 3 3 4
> 4 4 NA
>
> > apply(ss,1,sum,na.rm=TRUE)
> 1 2 3 4
> 3 0 7 4
>
> I am rather alarmed by that zero, because I was just about to place the sum
> function into am apply() on a rather large data management project, where
> about 5% of my matrix rows have two missing values. Is there a "safe" way
> to use sum(), so that such zeroes are not created? A safe.sum() that takes
> arguments just as general as sum()? I mean, I think I could get around this
> little problem like this,
>
> apply(ss,1,function(x){ifelse(all(is.na(x)),NA,sum(!is.na(x))*mean(x,na.rm=T
> RUE))})
> 1 2 3 4
> 3 NA 7 4
>
> but is there a safer way to write a sum() function? Or, do these zeroes
> serve some purpose that I am missing?
They are the correct answer! The sum of an empty set is zero, by
definition. If that is not what you want, then you don't want the sum and
should define a function to do what you do want. That might be
> apply(ss,1,function(x){z <- x[!is.na(x)]; ifelse(length(z), sum(z), NA)})
1 2 3 4
3 NA 7 4
Yours accounts for all missing twice.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list