[R] median() for ordered factor {was "what does this mean .."}

Martin Maechler maechler at stat.math.ethz.ch
Mon Nov 24 11:58:54 CET 2003


>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>     on 21 Nov 2003 15:08:09 +0100 writes:

    PD> "Liaw, Andy" <andy_liaw at merck.com> writes:
    >> > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]
    >> > 
    >> > John Christie <jc at or.psychology.dal.ca> writes:
    >> > 
    >> > > what does this mean in R-1.8.1 release notes?
    >> > > 
    >> > > o median() no longer `works' for odd-length > factor
    >> variables.
    >> > 
    >> > The median has always been undefined for factors, but
    >> nevertheless > median() gave an answer. If the length was
    >> even, it would > fail since it needed to average
    >> non-numeric values. This > confused some and the answer
    >> you got for in the odd-length > case was meaningless
    >> anyway (what's the median of three > pears, four apples,
    >> and two bananas?). So now we check.
    >> 
    >> Why not just give an error if median is given an
    >> unordered factor?

    PD> That's what we do and didn't:

    PD>     if (is.factor(x) || mode(x) != "numeric") 
    PD>            stop("need numeric data")

    PD> (also for ordered factors; it is not clear what to do if
    PD> the median sits between two levels in that case either.)

Actually, our  mad() function has  arguments  low & high 
(for partial S-plus compatibility) to ask for the 
lo-median or hi-median respectively.  These only differ from the
median in the case of even  n := length(x), and
for ox := sort(x)  give  ox[ n/2 ] or ox[n/2 + 1] respectively.

Hence, for ordered factors, the lo- and hi-median would be well
defined, and I have in the past considered propagating the 'low'
and 'high' arguments from mad() to median().

Martin




More information about the R-help mailing list