[R] subset using noncontiguous variables by name (not index)

Bert Gunter gunter.berton at gene.com
Mon Aug 27 00:49:58 CEST 2007


The problem is that "x3:x5" does not mean what you think it means. The only
reason it does the right thing in subset() is because a clever trick is used
there (read the code -- it's not hard to understand) to ensure that it does.
Gabor has essentially mimicked that trick in his solution.

However, it is not necessary do this. You can construct the call directly as
you tried to do. Using the anscombe example, here's how:

chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
do.call (subset, list( x = anscombe, select = parse(text = chooz)))

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor 
> Grothendieck
> Sent: Sunday, August 26, 2007 2:10 PM
> To: Muenchen, Robert A (Bob)
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] subset using noncontiguous variables by name 
> (not index)
> 
> Using builtin data frame anscombe try this. First we set up a 
> data frame
> anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
> out from that data frame and unlist it to get the desired 
> index vector.
> 
> > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > anscombe[idx]
>    x1 x3 x4   y2
> 1  10 10  8 9.14
> 2   8  8  8 8.14
> 3  13 13  8 8.74
> 4   9  9  8 8.77
> 5  11 11  8 9.26
> 6  14 14  8 8.10
> 7   6  6  8 6.13
> 8   4  4 19 3.10
> 9  12 12  8 9.13
> 10  7  7  8 7.26
> 11  5  5  8 4.74
> 
> 
> On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> > Hi All,
> >
> > I'm using the subset function to select a list of variables, some of
> > which are contiguous in the data frame, and others of which 
> are not. It
> > works fine when I use the form:
> >
> > subset(mydata,select=c(x1,x3:x5,x7) )
> >
> > In reality, my list is far more complex. So I would like to 
> store it in
> > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
> > work. That use of the c function seems to violate R rules, 
> so I'm not
> > sure how it works at all. A small simulation of the problem 
> is below.
> >
> > If the variable names & orders were really this simple, I could use
> > indices like
> >
> > summary( mydata[ ,c(1,3:5,7) ] )
> >
> > but alas, they are not.
> >
> > How does the c function work this way in the first place, 
> and how can I
> > make this substitution?
> >
> > Thanks,
> > Bob
> >
> > mydata <- data.frame(
> >  x1=c(1,2,3,4,5),
> >  x2=c(1,2,3,4,5),
> >  x3=c(1,2,3,4,5),
> >  x4=c(1,2,3,4,5),
> >  x5=c(1,2,3,4,5),
> >  x6=c(1,2,3,4,5),
> >  x7=c(1,2,3,4,5)
> > )
> > mydata
> >
> > # This does what I want.
> > summary(
> >  subset(mydata,select=c(x1,x3:x5,x7) )
> > )
> >
> > # Can I substitute myVars?
> > attach(mydata)
> > myVars1 <- c(x1,x3:x5,x7)
> >
> > # Not looking good!
> > myVars1
> >
> > # This doesn't do the right thing.
> > summary(
> >  subset(mydata,select=myVars1 )
> > )
> >
> > # Total desperation on this attempt:
> > myVars2 <- "x1,x3:x5,x7"
> > myVars2
> >
> > # This doesn't work either.
> > summary(
> >  subset(mydata,select=myVars2 )
> > )
> >
> >
> >
> > =========================================================
> > Bob Muenchen (pronounced Min'-chen), Manager
> > Statistical Consulting Center
> > U of TN Office of Information Technology
> > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > Voice: (865) 974-5230
> > FAX: (865) 974-4810
> > Email: muenchen at utk.edu
> > Web: http://oit.utk.edu/scc,
> > News: http://listserv.utk.edu/archives/statnews.html
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list