[R] Smart Indexing

Mon Aug 9 11:13:56 CEST 2010

Thanks, that does the trick. Again a new command learned. Thanks.

However, any hints regarding the rownames issue?

BR Thorn

> -----Original Message-----
> From: Dimitris Rizopoulos [mailto:d.rizopoulos at erasmusmc.nl]
> Sent: lundi 9 août 2010 11:07
> To: Thaler,Thorn,LAUSANNE,Applied Mathematics
> Cc: r-help at r-project.org
> Subject: Re: [R] Smart Indexing
> 
> I think you just need merge(), e.g.
> 
> a <- data.frame(id = rep(1:3, each=3), val = rnorm(9))
> b <- data.frame(id = 1:3, set1 = LETTERS[1:3], set2 = 5:7)
> 
> merge(a, b, by = "id")
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> On 8/9/2010 11:01 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics
> wrote:
> > Hi all,
> >
> > Suppose that I've two data frames, a and b say, both containing a
> column
> > 'id'. While data frame 'a' contains multiple rows sharing the same
> id,
> > data frame 'b' contains just one entry per id (i.e. a 1 to n
> > relationship). For the ease of modeling I now want to generate a new
> > data frame c, which is basically a copy of data frame 'a' augmented
> by
> > the values of b. If I have
> >
> > a<- data.frame(id = rep(1:3, each=3), val=rnorm(9))
> > b<- data.frame(id=1:3, set1=LETTERS[1:3], set2=5:7)
> >
> > the resulting data frame should look like:
> >
> > c<- data.frame(id = rep(1:3, each=3), val = a$val,
> > set1=rep(LETTERS[1:3], each=3), set2 = rep(5:7, each = 3))
> >
> > While this task is just an application of some 'rep's and 'c's for
> > structured data frames, it is somehow cumbersome (and error prone) to
> > construct 'c' explicitly for less structured data. Thus, I was
> thinking
> > of making use of R's smart indexing possibilities to generate an
> index
> > vector, i.e.:
> >
> > ind<- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
> > c.prime<- cbind(a, b[ind,-1])
> > rownames(c.prime)<- NULL
> > all.equal(c.prime , c) # TRUE
> >
> > The way I generate the index vector ind for the moment is
> >
> > tmp<- seq_along(b$id)
> > names(tmp)<- b$id
> > ind<- tmp[a$id]
> >
> > However, I think that there should be a smarter way of doing that
> > without the need of defining a temporary variable. Some combination
> of
> > match, which, %in% maybe? Any hints?
> >
> > While writing these lines, I think
> >
> > ind<- pmatch(a$id, b$id, duplicates=T)
> >
> > could do the job? Or do I run into troubles regarding the "partial
> > matching" involved in pmatch?
> >
> > BTW, is there a way to prevent R of assigning [row|col]names? In the
> > example above I had to remove the rownames generated by rbind
> > explicitly, is there an one-liner?
> >
> > Thanks for your input + BR
> >
> > Thorn
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> --
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
> 
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014