[R] subset using noncontiguous variables by name (not index)

Muenchen, Robert A (Bob) muenchen at utk.edu
Mon Aug 27 02:32:52 CEST 2007


Thanks Bert & Gabor for two very interesting solutions!

It would be very handy in R if string1:stringN generated
"string1","string2"..."stringN" it would make selections like this much
more obvious. I know it's easy to with the colon operator and paste
function but that's quite a step up in complexity compared to SAS' x1
x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
face early in learning R.

While on the subject of the colon operator, why doesn't anscombe[[1:4]]
select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
do in data frame form?

Thanks,

Bob

=========================================================
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: muenchen at utk.edu
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=========================================================


> -----Original Message-----
> From: Bert Gunter [mailto:gunter.berton at gene.com]
> Sent: Sunday, August 26, 2007 6:50 PM
> To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> Cc: r-help at stat.math.ethz.ch
> Subject: RE: [R] subset using noncontiguous variables by name (not
> index)
> 
> The problem is that "x3:x5" does not mean what you think it means. The
> only
> reason it does the right thing in subset() is because a clever trick
is
> used
> there (read the code -- it's not hard to understand) to ensure that it
> does.
> Gabor has essentially mimicked that trick in his solution.
> 
> However, it is not necessary do this. You can construct the call
> directly as
> you tried to do. Using the anscombe example, here's how:
> 
> chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> do.call (subset, list( x = anscombe, select = parse(text = chooz)))
> 
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
> 
> "The business of the statistician is to catalyze the scientific
> learning
> process."  - George E. P. Box
> 
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor
> > Grothendieck
> > Sent: Sunday, August 26, 2007 2:10 PM
> > To: Muenchen, Robert A (Bob)
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] subset using noncontiguous variables by name
> > (not index)
> >
> > Using builtin data frame anscombe try this. First we set up a
> > data frame
> > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
select
> > out from that data frame and unlist it to get the desired
> > index vector.
> >
> > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > anscombe[idx]
> >    x1 x3 x4   y2
> > 1  10 10  8 9.14
> > 2   8  8  8 8.14
> > 3  13 13  8 8.74
> > 4   9  9  8 8.77
> > 5  11 11  8 9.26
> > 6  14 14  8 8.10
> > 7   6  6  8 6.13
> > 8   4  4 19 3.10
> > 9  12 12  8 9.13
> > 10  7  7  8 7.26
> > 11  5  5  8 4.74
> >
> >
> > On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> > > Hi All,
> > >
> > > I'm using the subset function to select a list of variables, some
> of
> > > which are contiguous in the data frame, and others of which
> > are not. It
> > > works fine when I use the form:
> > >
> > > subset(mydata,select=c(x1,x3:x5,x7) )
> > >
> > > In reality, my list is far more complex. So I would like to
> > store it in
> > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
to
> > > work. That use of the c function seems to violate R rules,
> > so I'm not
> > > sure how it works at all. A small simulation of the problem
> > is below.
> > >
> > > If the variable names & orders were really this simple, I could
use
> > > indices like
> > >
> > > summary( mydata[ ,c(1,3:5,7) ] )
> > >
> > > but alas, they are not.
> > >
> > > How does the c function work this way in the first place,
> > and how can I
> > > make this substitution?
> > >
> > > Thanks,
> > > Bob
> > >
> > > mydata <- data.frame(
> > >  x1=c(1,2,3,4,5),
> > >  x2=c(1,2,3,4,5),
> > >  x3=c(1,2,3,4,5),
> > >  x4=c(1,2,3,4,5),
> > >  x5=c(1,2,3,4,5),
> > >  x6=c(1,2,3,4,5),
> > >  x7=c(1,2,3,4,5)
> > > )
> > > mydata
> > >
> > > # This does what I want.
> > > summary(
> > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > )
> > >
> > > # Can I substitute myVars?
> > > attach(mydata)
> > > myVars1 <- c(x1,x3:x5,x7)
> > >
> > > # Not looking good!
> > > myVars1
> > >
> > > # This doesn't do the right thing.
> > > summary(
> > >  subset(mydata,select=myVars1 )
> > > )
> > >
> > > # Total desperation on this attempt:
> > > myVars2 <- "x1,x3:x5,x7"
> > > myVars2
> > >
> > > # This doesn't work either.
> > > summary(
> > >  subset(mydata,select=myVars2 )
> > > )
> > >
> > >
> > >
> > > =========================================================
> > > Bob Muenchen (pronounced Min'-chen), Manager
> > > Statistical Consulting Center
> > > U of TN Office of Information Technology
> > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > Voice: (865) 974-5230
> > > FAX: (865) 974-4810
> > > Email: muenchen at utk.edu
> > > Web: http://oit.utk.edu/scc,
> > > News: http://listserv.utk.edu/archives/statnews.html
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



More information about the R-help mailing list