[R] subset using noncontiguous variables by name (not index)

Mon Aug 27 02:51:46 CEST 2007

Try this:

> "%:%" <- function(x, y) {
+    prex <- gsub("[0-9]", "", x); postx <- gsub("[^0-9]", "", x)
+    prey <- gsub("[0-9]", "", y); posty <- gsub("[^0-9]", "", y)
+    stopifnot(prex == prey)
+    paste(prex, seq(from = as.numeric(postx), to =
as.numeric(posty)), sep = "")
+ }
> "x2" %:% "x4"
[1] "x2" "x3" "x4"

On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> Thanks Bert & Gabor for two very interesting solutions!
>
> It would be very handy in R if string1:stringN generated
> "string1","string2"..."stringN" it would make selections like this much
> more obvious. I know it's easy to with the colon operator and paste
> function but that's quite a step up in complexity compared to SAS' x1
> x3-x4 y2 or SPSS' x1,x3 to x4, y2. And it's complexity that beginners
> face early in learning R.
>
> While on the subject of the colon operator, why doesn't anscombe[[1:4]]
> select the x variables in list form as anscombe[,1:4] or anscombe[1:4]
> do in data frame form?
>
> Thanks,
>
> Bob
>
> =========================================================
> Bob Muenchen (pronounced Min'-chen), Manager
> Statistical Consulting Center
> U of TN Office of Information Technology
> 200 Stokely Management Center, Knoxville, TN 37996-0520
> Voice: (865) 974-5230
> FAX: (865) 974-4810
> Email: muenchen at utk.edu
> Web: http://oit.utk.edu/scc,
> News: http://listserv.utk.edu/archives/statnews.html
> =========================================================
>
>
> > -----Original Message-----
> > From: Bert Gunter [mailto:gunter.berton at gene.com]
> > Sent: Sunday, August 26, 2007 6:50 PM
> > To: 'Gabor Grothendieck'; Muenchen, Robert A (Bob)
> > Cc: r-help at stat.math.ethz.ch
> > Subject: RE: [R] subset using noncontiguous variables by name (not
> > index)
> >
> > The problem is that "x3:x5" does not mean what you think it means. The
> > only
> > reason it does the right thing in subset() is because a clever trick
> is
> > used
> > there (read the code -- it's not hard to understand) to ensure that it
> > does.
> > Gabor has essentially mimicked that trick in his solution.
> >
> > However, it is not necessary do this. You can construct the call
> > directly as
> > you tried to do. Using the anscombe example, here's how:
> >
> > chooz <- "c(x1,x3:x4,y2)"  ## enclose the desired expression in quotes
> > do.call (subset, list( x = anscombe, select = parse(text = chooz)))
> >
> > -- Bert Gunter
> > Genentech Non-Clinical Statistics
> > South San Francisco, CA
> >
> > "The business of the statistician is to catalyze the scientific
> > learning
> > process."  - George E. P. Box
> >
> >
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor
> > > Grothendieck
> > > Sent: Sunday, August 26, 2007 2:10 PM
> > > To: Muenchen, Robert A (Bob)
> > > Cc: r-help at stat.math.ethz.ch
> > > Subject: Re: [R] subset using noncontiguous variables by name
> > > (not index)
> > >
> > > Using builtin data frame anscombe try this. First we set up a
> > > data frame
> > > anscombe.seq which has one row containing 1, 2, 3, ... .  Then
> select
> > > out from that data frame and unlist it to get the desired
> > > index vector.
> > >
> > > > anscombe.seq <- replace(anscombe[1,], TRUE, seq_along(anscombe))
> > > > idx <- unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
> > > > anscombe[idx]
> > >    x1 x3 x4   y2
> > > 1  10 10  8 9.14
> > > 2   8  8  8 8.14
> > > 3  13 13  8 8.74
> > > 4   9  9  8 8.77
> > > 5  11 11  8 9.26
> > > 6  14 14  8 8.10
> > > 7   6  6  8 6.13
> > > 8   4  4 19 3.10
> > > 9  12 12  8 9.13
> > > 10  7  7  8 7.26
> > > 11  5  5  8 4.74
> > >
> > >
> > > On 8/26/07, Muenchen, Robert A (Bob) <muenchen at utk.edu> wrote:
> > > > Hi All,
> > > >
> > > > I'm using the subset function to select a list of variables, some
> > of
> > > > which are contiguous in the data frame, and others of which
> > > are not. It
> > > > works fine when I use the form:
> > > >
> > > > subset(mydata,select=c(x1,x3:x5,x7) )
> > > >
> > > > In reality, my list is far more complex. So I would like to
> > > store it in
> > > > a variable to substitute in for c(x1,x3:x5,x7) but cannot get it
> to
> > > > work. That use of the c function seems to violate R rules,
> > > so I'm not
> > > > sure how it works at all. A small simulation of the problem
> > > is below.
> > > >
> > > > If the variable names & orders were really this simple, I could
> use
> > > > indices like
> > > >
> > > > summary( mydata[ ,c(1,3:5,7) ] )
> > > >
> > > > but alas, they are not.
> > > >
> > > > How does the c function work this way in the first place,
> > > and how can I
> > > > make this substitution?
> > > >
> > > > Thanks,
> > > > Bob
> > > >
> > > > mydata <- data.frame(
> > > >  x1=c(1,2,3,4,5),
> > > >  x2=c(1,2,3,4,5),
> > > >  x3=c(1,2,3,4,5),
> > > >  x4=c(1,2,3,4,5),
> > > >  x5=c(1,2,3,4,5),
> > > >  x6=c(1,2,3,4,5),
> > > >  x7=c(1,2,3,4,5)
> > > > )
> > > > mydata
> > > >
> > > > # This does what I want.
> > > > summary(
> > > >  subset(mydata,select=c(x1,x3:x5,x7) )
> > > > )
> > > >
> > > > # Can I substitute myVars?
> > > > attach(mydata)
> > > > myVars1 <- c(x1,x3:x5,x7)
> > > >
> > > > # Not looking good!
> > > > myVars1
> > > >
> > > > # This doesn't do the right thing.
> > > > summary(
> > > >  subset(mydata,select=myVars1 )
> > > > )
> > > >
> > > > # Total desperation on this attempt:
> > > > myVars2 <- "x1,x3:x5,x7"
> > > > myVars2
> > > >
> > > > # This doesn't work either.
> > > > summary(
> > > >  subset(mydata,select=myVars2 )
> > > > )
> > > >
> > > >
> > > >
> > > > =========================================================
> > > > Bob Muenchen (pronounced Min'-chen), Manager
> > > > Statistical Consulting Center
> > > > U of TN Office of Information Technology
> > > > 200 Stokely Management Center, Knoxville, TN 37996-0520
> > > > Voice: (865) 974-5230
> > > > FAX: (865) 974-4810
> > > > Email: muenchen at utk.edu
> > > > Web: http://oit.utk.edu/scc,
> > > > News: http://listserv.utk.edu/archives/statnews.html
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>