[R] Semantics of sequences in R

Stavros Macrakis macrakis at alum.mit.edu
Sun Feb 22 21:42:46 CET 2009


Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
thought I'd clear up for myself the exact relationship among the
various sequence concepts in R, including not only generic vectors
(lists) and atomic vectors, but also pairlists, factor sequences,
date/time sequences, and difftime sequences.

I tabulated type of sequence vs. property to see if I could make sense
of all this.  The properties I looked at were the predicates
is.{vector,list,pairlist}; whether various sequence operations (c,
rev, unique, sort, rle) can be used on objects of the various types,
and if relevant, whether they preserve the type of the input; and what
the length of class( as.XXX (1:2) ) is.

Here are the results (code to reproduce at end of email):

             numer list  plist fact  POSIXct difft
is.vector    TRUE  TRUE  FALSE FALSE FALSE   FALSE
is.list      FALSE TRUE  TRUE  FALSE FALSE   FALSE
is.pairlist  FALSE FALSE TRUE  FALSE FALSE   FALSE
c_keep?      TRUE  TRUE  FALSE FALSE TRUE    FALSE
rev_keep?    TRUE  TRUE  FALSE TRUE  TRUE    TRUE
unique_keep? TRUE  TRUE  "Err" TRUE  TRUE    FALSE
sort_keep?   TRUE  "Err" "Err" TRUE  TRUE    TRUE
rle_len      2     "Err" "Err" "Err" "Err"   "Err"

Alas, this tabulation, rather than clarifying things for me, just
confused me more -- the diverse treatment of sequences by various
operations is all rather bewildering.

Wouldn't it be easier to teach, learn, and use R if there were more
consistency in the treatment of sequences?  I understand that in
long-running projects like S/R, there is an accumulation of
contributions by a variety of authors, but perhaps the time has come
for some cleanup at least for the base library?

             -s


# generic outer: for generic vectors and non-vectorized functions
gouter <-
  function(x,y,f,...)
  matrix( mapply( f,
                  rep(x,length(y)),
                  rep(y,each = length(x)),
                  SIMPLIFY = FALSE ), # don't coerce booleans to numerics
          length(x), length(y),
          dimnames = list( names(x), names(y) ) )

# if arg evaluation gives error, return "Err", else its value
if_err <-
  function(expr)
    { if (class(try(expr,silent = TRUE)) == "try-error") "Err"
      else expr }
# {} needed so else will parse properly

# does f(x) have the same class as x?
keep_class <-
  function(f)
    function(x)
      if_err( all(class(x) == class(f(x))))

seqtest <- function(seq)
  {
    lseq <- length(seq)
    gouter(
       list(
            is.vector = is.vector,
            is.list = is.list,
            is.pairlist = is.pairlist,
            `c_keep?` = keep_class(c),
            `rev_keep?` = keep_class(rev) ,
            `unique_keep?` = keep_class(unique),
## Beware: unique prints an error message for bad args
## even within try(...,silent=TRUE)
            `sort_keep?` = keep_class(sort),
            rle_len = function(a) if_err(length(rle(a)$length))
            ),
       list(
            numer = as.numeric(seq),
            list = as.list(seq),
            plist = as.pairlist(seq),
            fact = as.factor(seq),
            POSIXct = as.POSIXct(seq,origin = '1970-1-1'),
            difft = as.difftime(seq,units = 'secs')
            ),
       function(f,a)f(a)
       )
  }

print(seqtest(1:2))
# This starts by printing [[1]] [1]...
# because of the bug in unique mentioned above




More information about the R-help mailing list