[R] Sort problem in merge()

Mon Mar 6 23:30:44 CET 2006

One other idea; one could use match instead of merge:
> # tmp1a and tmp2a from below
> cbind(tmp1a, tmp2a[match(tmp1a$col1, tmp2a$col1), -1, drop = FALSE])
  col1 col2
1    A   NA
2    A   NA
3    C    1
4    C    1
5    0   NA
6    0   NA

This avoids having to muck with reordering of rows and reseting of rownames.
Like the prior solution, it assumes that the elements of tmp2a$col1
are unique.

On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Sorry, I mixed up out and outa in the last post.  Here it is correctly.
>
> > levs <- c(LETTERS[1:6], "0")
> > tmp1a <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > tmp2a <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> >
> > out <- merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE)
> > out <- out[out$seq, -2]
> > rownames(out) <- rownames(tmp1a)
> > out
>  col1 col2
> 1    A   NA
> 2    A   NA
> 3    C    1
> 4    C    1
> 5    0   NA
> 6    0   NA
>
>
>
> On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > On 3/6/06, Gregor Gorjanc <gregor.gorjanc at gmail.com> wrote:
> > >
> > > But I want to get out
> > >
> > > A NA
> > > A NA
> > > C 1
> > > C 1
> > > 0 NA
> > > 0 NA
> > >
> >
> > That's what I get except for the rownames.  Be sure to
> > make the factor levels consistent.  I have renamed the data frames
> > tmp1a and tmp2a to distinguish them from the ones in your
> > post and have also reset the rownames to be the original
> > ones, as requested, so that the following is self contained
> > and should be reproducible:
> >
> > > levs <- c(LETTERS[1:6], "0")
> > > tmp1a <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > > tmp2a <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> > >
> > > outa <- merge( cbind(tmp1a, seq = 1:nrow(tmp1a)), tmp2a, all.x = TRUE)
> > > outa <- outa[out$seq, -2]
> > > rownames(outa) <- rownames(tmp1a)
> > > outa
> >  col1 col2
> > 1    0   NA
> > 2    0   NA
> > 3    A   NA
> > 4    A   NA
> > 5    C    1
> > 6    C    1
> > >
> > > R.version.string # Windows XP
> > [1] "R version 2.2.1, 2005-12-20"
> >
> > By the way, the main limitation with this approach is that the elements of
> > tmp2$col1 be unique so that the result has rows which correspond to those
> > of tmp1; however, that seems to be the case here.
> >
>