[R] merging

Tue May 30 22:43:16 CEST 2006

On Tue, 2006-05-30 at 15:38 -0500, Marc Schwartz (via MN) wrote:
> On Tue, 2006-05-30 at 19:08 +0100, Gavin Simpson wrote:
> > Dear List,
> > 
> > Given,
> > 
> > y <- matrix(c(0,1,1,1,0,0,0,4,4), ncol = 3, byrow = TRUE)
> > rownames(y) <- c("a","b","c")
> > colnames(y) <- c("1","2","3")
> > y
> > y2 <- y[2:3, ]
> > rownames(y2) <- c("x","z")
> > y2
> > 
> > how can I stop
> > 
> > merge(y, y2, all = TRUE, sort = FALSE)
> > 
> > squishing the extra rows? Ideally I want the same as:
> > 
> > rbind(y, y2)
> > 
> > in this case. This is specific example of situation where two data
> > matrices have same column variables and all I want is to stick the two
> > sets of rows together, but I have been using merge for cases such as the
> > one below, where the second matrix has extra column(s):
> > 
> > y3 <- matrix(c(0,1,1,1,0,0,0,4,4,5,6,7), ncol = 4, byrow = TRUE)
> > rownames(y3) <- c("d","e","f")
> > colnames(y3) <- c("1","2","3","4")
> > y3
> > merge(y, y3, all = TRUE, sort = FALSE)
> > 
> > We don't know before hand if the columns will match. But I see now that
> > even this doesn't work as I was expecting/thinking!
> > 
> > So I'm looking for a general way to merge two matrices such that the
> > number of rows in the merged matrix is nrow(mat1) + nrow(mat2) and the
> > number of columns in the merged matrix is length(unique(colnames(mat1),
> > colnames(mat2).
> > 
> > Is there a function in R to do this, or can someone suggest a way to
> > achieve this? My R version info is at the end.
> > 
> > Just to be clear, for the y, y3 example I want something like this
> > returned:
> > 
> >   1 2 3 4
> > a 0 1 1 NA
> > b 1 0 0 NA
> > c 0 4 4 NA
> > d 0 1 1 1
> > e 0 0 0 4
> > f 4 5 6 7
> > 
> > and for the y, y2 example, I want something like this returned:
> > 
> >   1 2 3
> > a 0 1 1
> > b 1 0 0
> > c 0 4 4
> > x 1 0 0
> > z 0 4 4
> 
> Gavin,
> 
> Here is a possible solution, though not fully tested.
> 
> It uses the "row.names" for the two matrices as part of the 'by'
> matching process. This is noted in the "Details" section in ?merge.
> 
> So for y and y2:
> 
> > res <- merge(y, y2, 
>                by = c("row.names", intersect(colnames(y),
>                       colnames(y2))), 
>                all = TRUE)
> 
> # Note that the row names are now the first col
> > res
>   Row.names 1 2 3
> 1         a 0 1 1
> 2         b 1 0 0
> 3         c 0 4 4
> 4         x 1 0 0
> 5         z 0 4 4
> 
> # Subset res, leaving out the first col
> > mat <- res[, -1]
> 
> # Set the rownames from res
> > rownames(mat) <- res[, 1]
> 
> > mat
>   1 2 3
> a 0 1 1
> b 1 0 0
> c 0 4 4
> x 1 0 0
> z 0 4 4

Ack...hit the wrong button. Sorry.  

Must be the long weekend....yeah, that's my story and I'm sticking to
it...  ;-)

Here is the solution for y and y3:

> res2 <- merge(y, y3, 
                by = c("row.names", intersect(colnames(y),
                       colnames(y3))), 
                all = TRUE)

> res2
  Row.names 1 2 3  4
1         a 0 1 1 NA
2         b 1 0 0 NA
3         c 0 4 4 NA
4         d 0 1 1  1
5         e 0 0 0  4
6         f 4 5 6  7

> mat2 <- res2[, -1]

> rownames(mat2) <- res2[, 1]

> mat2
  1 2 3  4
a 0 1 1 NA
b 1 0 0 NA
c 0 4 4 NA
d 0 1 1  1
e 0 0 0  4
f 4 5 6  7

HTH,

Marc Schwartz