[Rd] Proposal unary - operator for factors

William Dunlap wdunlap at tibco.com
Thu Feb 4 01:43:26 CET 2010


> -----Original Message-----
> From: Duncan Murdoch [mailto:murdoch at stats.uwo.ca] 
> Sent: Wednesday, February 03, 2010 4:17 PM
> To: William Dunlap
> Cc: Hadley Wickham; r-devel at r-project.org
> Subject: Re: [Rd] Proposal unary - operator for factors
> 
> On 03/02/2010 6:49 PM, William Dunlap wrote:
> >> -----Original Message-----
> >> From: h.wickham at gmail.com [mailto:h.wickham at gmail.com] On 
> >> Behalf Of Hadley Wickham
> >> Sent: Wednesday, February 03, 2010 3:38 PM
> >> To: William Dunlap
> >> Cc: r-devel at r-project.org
> >> Subject: Re: [Rd] Proposal unary - operator for factors
> >>
> >>> It wouldn't make sense in the context of
> >>>   vector[-factor]
> >> True, but that doesn't work currently so you wouldn't lose 
> anything.
> >> However, it would make a certain class of problem that 
> used to throw
> >> errors become silent.
> >>
> >>> Wouldn't it be better to allow order's decreasing argument
> >>> to be a vector with one element per ... argument?  That
> >>> would work for numbers, factors, dates, and anything
> >>> else.  Currently order silently ignores decreasing[2] and
> >>> beyond.
> >> The problem is you might want to do something like 
> order(a, -b, c, -d)
> > 
> > Currently, for numeric a you can do either
> >    order(-a)
> > or
> >    order(a, decreasing=FALSE)
> > For nonnumeric types like POSIXct and factors only
> > the latter works.
> > 
> > Under my proposal your
> >    order(a, -b, c, d)
> > would be
> >    order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
> > and it would work for any ordably class without modifications
> > to any classes.
> 
> Why not use
> 
>   order(a, -xtfrm(b), c, -xtfrm(d))
> 
> ??

You could, if you can remember it.  I have been annoyed
that decreasing= was in order() but not as useful as it
could be since it is not vectorized.  The same goes for
na.last, although that seems less useful to me.

Here is a version of order (based on the
algorithm using in S+'s order) that
vectorizes the na.last and decreasing
arguments.  It calls the existing order
function to implement decreasing=TRUE/FALSE
and na.last=TRUE/FALSE for a single argument
but order itself could be mofified in this
way.

new.order <- function (..., na.last = TRUE, decreasing = FALSE) 
{
    vectors <- list(...)
    nVectors <- length(vectors)
    stopifnot(nVectors > 0)
    na.last <- rep(na.last, length = nVectors)
    decreasing <- rep(decreasing, length = nVectors)
    keys <- seq_len(length(vectors[[1]]))
    for (i in nVectors:1) {
        v <- vectors[[i]]
        if (length(v) < length(keys)) 
            v <- rep(v, length = length(keys))
        keys <- keys[order(v[keys], na.last = na.last[i], decreasing =
decreasing[i])]
    }
    keys
}

With the following dataset

data <- data.frame(
  ct = as.POSIXct(c("2009-01-01", "2010-02-03",
"2010-02-28"))[c(2,2,2,3,3,1)],
  dt =    as.Date(c("2009-01-01", "2010-02-03",
"2010-02-28"))[c(3,2,2,2,3,1)],
  fac =  factor(c("Small","Medium","Large"),
levels=c("Small","Medium","Large"))[c(1,3,2,3,3,1)],
  n  =    c(11,12,12,11,12,12))

> data
          ct         dt    fac  n
1 2010-02-03 2010-02-28  Small 11
2 2010-02-03 2010-02-03  Large 12
3 2010-02-03 2010-02-03 Medium 12
4 2010-02-28 2010-02-03  Large 11
5 2010-02-28 2010-02-28  Large 12
6 2009-01-01 2009-01-01  Small 12
> data.frame(lapply(data,rank))
   ct  dt fac   n
1 3.0 5.5 1.5 1.5
2 3.0 3.0 5.0 4.5
3 3.0 3.0 3.0 4.5
4 5.5 3.0 5.0 1.5
5 5.5 5.5 5.0 4.5
6 1.0 1.0 1.5 4.5

we get (where my demos use rank because I could remember
the name xtfrm):

> with(data, identical(order(ct,dt), new.order(ct,dt)))
[1] TRUE
> with(data, identical(order(fac,-n),
new.order(fac,n,decreasing=c(FALSE,TRUE))))
[1] TRUE
> with(data, identical(order(ct,-rank(dt)),
new.order(ct,dt,decreasing=c(FALSE,TRUE))))
[1] TRUE
> with(data, identical(order(ct,-rank(fac)),
new.order(ct,fac,decreasing=c(FALSE,TRUE))))
[1] TRUE
> with(data, identical(order(n,-rank(fac)),
new.order(n,fac,decreasing=c(FALSE,TRUE))))
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  
> 
> Duncan Murdoch
> 



More information about the R-devel mailing list