[R] once more: methods on missing data

Thomas Lumley tlumley at u.washington.edu
Thu Jun 7 17:16:14 CEST 2001

On Thu, 7 Jun 2001 Maciej.Hoffman-Wecker at evotecoai.com wrote in part:

> The result of the call
>      x <- as.numeric(c(NA,NA,NA)); STATISTIC(x[!is.na(x)])
> depends on the STATISTIC.
>      STATISTIC           RESULT
>      min                 Inf and warning message
>      max                 -Inf and warning message
>      mean                NaN and no warning message
>      quantile            named vector containing NAs and no warning message
>      sd                  abortion of the evaluation with an error message
> Should not the statistics generally return NA and a warning message?

Ideally, they shouldn't.  NA is missing data -- that is, we don't know the
value of the statistic because some data were not measured. That's why,
for example  NA & FALSE is FALSE, not NA, because the value of the
expression is known, no matter what the first operand is.

The results for min() and max() have the rationale that eg max(a,max(b))
should return the same as max(a,b) even when b is empty. There's even some
examples where this is genuinely helpful.

If the others were to return a value I think NaN (undefined numerical
result) would be better than NA (missing data), as is the case with
mean(). This would argue for changing the return value of quantile() as

However, I think it's reasonable for a function to refuse to calculate the
variance of no data. We do have try() to handle errors if needed.


Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list