[R] subset using noncontiguous variables by name (not index)

Thomas Lumley tlumley at u.washington.edu
Mon Aug 27 16:24:30 CEST 2007


On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:

> Gabor, That works great!
>
> I think this would be a very helpful addition to the main R
> distribution. Perhaps with a single colon representing numerical order
> (exactly as you have written it) and two colons representing the order
> of the variables as they appear in the data frame (your first example).
> That's analogous to SAS' x1-xN, which you know gets those N variables,
> and a--z, which selects an unknown number of variables a through z. How
> many that is depends upon their order in the data frame. That would not
> only be very useful in general, but it would also make transitioning to
> R from SAS or SPSS less confusing.
>
> Is R still being extended in such basic ways, or does that muck up
> existing programs too much?
>

In principle base R can be extended like that, but a strong case is needed 
for non-standard evaluation rules and for depleting the restricted supply 
of short binary operator names.

The reason for subset() and its behaviour is that 'variables as they 
appear the in data frame' is typically ambiguous -- which data frame?  In 
SPSS you have only one and in SAS there is a default one, so there is no 
ambiguity in X1--Y2, but in R it needs another argument specifying the 
data frame, so it can't really be a binary operator.

The double colon :: and triple colon ::: are already used for namespaces, 
and a search of r-help reveals two previous, different, suggestions for 
%:%.


 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list