R-alpha: Re: select.frame

Ross Ihaka ihaka@stat.auckland.ac.nz
Wed, 16 Apr 1997 10:54:00 +1200 (NZST)


Peter Dalgaard writes:

> Here's the select.frame() function I babbled about before. Suggestions
> about coding style, etc., are welcome, I feel a bit green at this.

	[ function code omitted. ]

This is a very nice function!  One of the things I like about SAS is
the ability to specify ranges of variables.  I do have a couple of
suggestions.

1. The expression 

	e <- as.call(c(as.name("c"), expression(...)))

   can be written more concisely as

	e <- substitute(c(...))

2. As written, the function uses R scoping rules to pass the value of
   the variable "nm" into the body of "subst.exp", and so the function
   will not work in S.  It might be better to pass "nm" as an argument
   so that the function is portable.

3. It might be good to check that only ":" expressions allowed.


Here is a modified version version of the function which does the above.

select.frame<-
function (dfr, ...)
{
        subst.exp <- function(e, nm) {
                for (i in 2:length(e)) {
                        ei <- e[[i]]
                        if (is.name(ei)) {
                                n <- match(as.character(ei), nm)
                                if (!is.na(n))
                                        e[[i]] <- n
                        }
                        else if (is.call(ei) && as.character(ei[[1]]) == ":")
                                e[[i]] <- subst.exp(ei, nm)
                        else if (is.numeric(ei)) {
                                e[[i]] <- e[[i]]
                        }
                        else stop("invalid selection")
                }
                e
        }
        dfr[, eval(subst.exp(substitute(c(...)), names(dfr)))]
}

Here it is in action:

	> select.frame(iris,Sepal.Length:Petal.Length,Species)[1:5,]
	  Sepal.Length Sepal.Width Petal.Length Species
	1          5.1         3.5          1.4  setosa
	2          4.9         3.0          1.4  setosa
	3          4.7         3.2          1.3  setosa
	4          4.6         3.1          1.5  setosa
	5          5.0         3.6          1.4  setosa
	> 

Note that in S you can write

	as.character(ei[[1]]) == ":"

in the simpler form

	ei[[1]] == ":"

and I have made a change to (my version of) R to make this work.

Finally, it would be possible to use the forbidden black arts (i.e.
"Recall") to make the function body a single expression.  However
I would recommend that "Recall" never be used.  The function above is
more clearly written than a version using "Recall".

I also think that this idea is useful enough to generalized.  One
approach would be to make a generic function "select" and make the
function above the method for data frames - i.e. call it
select.data.frame.

Another idea would be to have "select" operate on name vectors in
general, rather than on just variable names in data frames.  Then the
same trick could be used to select rows as well as columns.
	Ross
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-