[Rd] Inconsistency in median()

Gustavo Zapata Wainberg gz@p@t@w@|nberg @end|ng |rom gm@||@com
Wed May 5 16:28:17 CEST 2021


Hi, thanks Dr. Mächler for your prompt response!

I agree with your explanations about this issue. But I was thinking of
something like adding an argument to median() and mean() that could keep
the attributes of the variables if set to TRUE.

Thanks again.

Best regards

El mar, 4 may 2021 a las 17:57, Martin Maechler (<maechler using stat.math.ethz.ch>)
escribió:

> >>>>> Gustavo Zapata Wainberg
> >>>>>     on Mon, 3 May 2021 20:48:49 +0200 writes:
>
>     > Hi!
>
>     > I'm wrinting this post because there is an inconsistency
>     > when median() is calculated for even or odd vectors. For
>     > odd vectors, attributes (such as labels added with Hmisc)
>     > are kept after running median(), but this is not the case
>     > if the vector is even, in this last case attributes are
>     > lost.
>
>     > I know that this is due to median() using mean() to obtain
>     > the result when the vector is even, and mean() always
>     > takes attributes off vectors.
>
> Yes, and this has been the design of  median()  for ever :
>
> If n := length(x)  is odd,  the median is "the middle" observation,
>                    and should  equal to x[j] for j = (n+1)/2
>                    and hence e.g., is well defined for an ordered factor.
>
> When  n  is even
>      however, median() must be the mean of "the two middle" observations,
>        which is e.g., not even *defined* for an ordered factor.
>
> We *could* talk of the so called lo-median  or hi-median
> (terms probably coined by John W. Tukey) because (IIRC), these
> are equal to each other and to the median for odd n, but
> are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
> still "of the same kind" as x[]  itself.
>
> Interestingly, for the mad() { = the median absolute deviation from the
> median}
> we *do* allow to specify logical 'low' and 'high',
> but that for the "outer" median in MAD's definition, not the
> inner one.
>
> ## From <Rsrc>/src/library/stats/R/mad.R :
>
> mad <- function(x, center = median(x), constant = 1.4826,
>                 na.rm = FALSE, low = FALSE, high = FALSE)
> {
>     if(na.rm)
>         x <- x[!is.na(x)]
>     n <- length(x)
>     constant *
>         if((low || high) && n%%2 == 0) {
>             if(low && high) stop("'low' and 'high' cannot be both TRUE")
>             n2 <- n %/% 2 + as.integer(high)
>             sort(abs(x - center), partial = n2)[n2]
>         }
>         else median(abs(x - center))
> }
>
>
>
>
>     > Don't you think that attributes should be kept in both
>     > cases?
>
> well, not all attributes can be kept.
> Note that for *named* vectors x,  x[j] can (and does) keep the name,
> but there's definitely no sensible name to give to (x[j] + x[j+1])/2
>
> I'm willing to collaborate with some, considering
> to extend  median.default()  making  hi-median and lo-median
> available to the user.
> Both of these will always return x[j] for some j and hence keep
> all (sensible!) attributes (well, if the `[`-method for the
> corresponding class has been defined correctly; I've encountered
> quite a few cases where people created vector-like classes but
> did not provide a "correct"  subsetting method (typically you
> should make sure both a `[[` and `[` method works!).
>
> Best regards,
> Martin
>
> Martin Maechler
> ETH Zurich  and  R Core team
>
>     > And, going further, shouldn't mean() keep
>     > attributes as well? I have looked in R's Bugzilla and I
>     > didn't find an entry related to this issue.
>
>     > Please, let me know if you consider that this issue should
>     > be posted in R's bugzilla.
>
>     > Here is an example with code.
>
>     > rndvar <- rnorm(n = 100)
>
>     > Hmisc::label(rndvar) <- "A label for RNDVAR"
>
>     > str(median(rndvar[-c(1,2)]))
>
>     > Returns: "num 0.0368"
>
>     > str(median(rndvar[-1]))
>
>     > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
>     > label for RNDVAR"
>
>     > Thanks in advance!
>
>     > Gustavo Zapata-Wainberg
>
>     >   [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > R-devel using r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list