[R] sd(): why error with NA's?

Raubertas, Richard richard_raubertas at merck.com
Fri Mar 19 00:51:09 CET 2004


R 1.8.1 with Windows XP.

I have a question about how sd() behaves with NA's:

> mean(c(1,2,3,NA))
[1] NA
> median(c(1,2,3,NA))
[1] NA
> mad(c(1,2,3,NA))
[1] NA
> sd(c(1,2,3,NA))
Error in var(x, na.rm = na.rm) : missing observations in cov/cor
>

What is so special about the standard deviation, relative to
other descriptive statistics, that the presence of NA's 
deserves an error instead of simply returning NA?
(I know about na.rm=TRUE, but that is not the point here.)

A few small changes to sd() would seem to resolve the anomaly:

sd <- function(x, na.rm=FALSE)
# Function like built-in 'sd', but return NA instead of error when
# 'na.rm' is FALSE and 'x' has NA's.
{
    if (is.matrix(x)) {
        apply(x, 2, sd, na.rm = na.rm)
    } else if (is.vector(x)) {
        if (!na.rm && any(is.na(x)))  NA
        else sqrt(var(x, na.rm = na.rm))
    } else if (is.data.frame(x)) {
        sapply(x, sd, na.rm = na.rm)
    } else {
        x <- as.vector(x)
        if (!na.rm && any(is.na(x)))  NA
        else sqrt(var(x, na.rm = na.rm))
    }
}

Rich Raubertas
Merck & Co.


------------------------------------------------------------------------------Notice:  This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message.  If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.




More information about the R-help mailing list