[Rd] quantile(), IQR() and median() for factors

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 6 18:36:40 CET 2009


On Fri, 6 Mar 2009, Greg Snow wrote:

> I like the idea of median and friends working on ordered factors. 
> Just a couple of thoughts on possible implementations.
>
> Adding extra checks and functionality will slow down the function. 
> For a single evaluation on a given dataset this slowdown will not be 
> noticeable, but inside of a simulation, bootstrap, or other high 
> iteration technique, it could matter.  I would suggest creating a 
> core function that does just the calculations (median, quantile, 
> iqr) assuming that the data passed in is correct without doing any 
> checks or anything fancy.  Then the user callable function (median 
> et. al.) would do the checks dispatch to other functions for 
> anything fancy, etc. then call the core function with the clean 
> data.  The common user would not really notice a difference, but 
> someone programming a high iteration technique could clean the data 
> themselves, then call the core function directly bypassing the 
> checks/branches.

Since median and quantile are already generic, adding a 'ordered' 
method would be zero cost to other uses.  And the factor check at the 
head of median.default could be replaced by median.factor if someone 
could show a convincing performance difference.

> Just out of curiosity (from someone who only learned from English 
> (Americanized at that) and not Italian texts), what would the median 
> of [Low, Low, Medium, High] be?

I don't think it is 'the' median but 'a' median.  (Even English 
Wikipedia says the median is not unique for even numbers of inputs.)

>
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
>> project.org] On Behalf Of Simone Giannerini
>> Sent: Thursday, March 05, 2009 4:49 PM
>> To: R-devel
>> Subject: [Rd] quantile(), IQR() and median() for factors
>>
>> Dear all,
>>
>> from the help page of quantile:
>>
>> "x     numeric vectors whose sample quantiles are wanted. Missing
>> values are ignored."
>>
>> from the help page of IQR:
>>
>> "x     a numeric vector."
>>
>> as a matter of facts it seems that both quantile() and IQR() do not
>> check for the presence of a numeric input.
>> See the following:
>>
>> set.seed(11)
>> x <- rbinom(n=11,size=2,prob=.5)
>> x <- factor(x,ordered=TRUE)
>> x
>>  [1] 1 0 1 0 0 2 0 1 2 0 0
>> Levels: 0 < 1 < 2
>>
>>> quantile(x)
>>   0%  25%  50%  75% 100%
>>    0 <NA>    0 <NA>    2
>> Levels: 0 < 1 < 2
>> Warning messages:
>> 1: In Ops.ordered((1 - h), qs[i]) :
>>   '*' is not meaningful for ordered factors
>> 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
>> factors
>>
>>> IQR(x)
>> [1] 1
>>
>> whereas median has the check:
>>
>>> median(x)
>> Error in median.default(x) : need numeric data
>>
>> I also take the opportunity to ask your comments on the following
>> related subject:
>>
>> In my opinion it would be convenient that median() and the like
>> (quantile(), IQR()) be implemented for ordered factors for which in
>> fact
>> they can be well defined. For instance, in this way functions like
>> apply(x,FUN=median,...) could be used without the need of further
>> processing for
>> data frames that contain both numeric variables and ordered factors.
>> If on the one hand, to my limited knowledge, in English introductory
>> statistics
>> textbooks the fact that the median is well defined for ordered
>> categorical variables is only mentioned marginally,
>> on the other hand, in the Italian Statistics literature this is often
>> discussed in detail and this could mislead students and practitioners
>> that might
>> expect median() to work for ordered factors.
>>
>> In this message
>>
>> https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html
>>
>> Martin Maechler considers the possibility of doing such a job by
>> allowing for extra arguments "low" and "high" as it is done for mad().
>> I am willing to give a contribution if requested, and comments are
>> welcome.
>>
>> Thank you for the attention,
>>
>> kind regards,
>>
>> Simone
>>
>>> R.version
>>                _
>> platform       i386-pc-mingw32
>> arch           i386
>> os             mingw32
>> system         i386, mingw32
>> status
>> major          2
>> minor          8.1
>> year           2008
>> month          12
>> day            22
>> svn rev        47281
>> language       R
>> version.string R version 2.8.1 (2008-12-22)
>>
>>  LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=
>> Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252
>>
>> --
>> ______________________________________________________
>>
>> Simone Giannerini
>> Dipartimento di Scienze Statistiche "Paolo Fortunati"
>> Universita' di Bologna
>> Via delle belle arti 41 - 40126  Bologna,  ITALY
>> Tel: +39 051 2098262  Fax: +39 051 232153
>> http://www2.stat.unibo.it/giannerini/
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list