[R] pairwise difference operator

Marc Schwartz MSchwartz at MedAnalytics.com
Sun Aug 1 01:42:24 CEST 2004


On Fri, 2004-07-30 at 20:28, Marc Schwartz wrote:
> On Fri, 2004-07-30 at 18:30, Adaikalavan Ramasamy wrote:
> > There was a BioConductor thread today where the poster wanted to find
> > pairwise difference between columns of a matrix. I suggested the slow
> > solution below, hoping that someone might suggest a faster and/or more
> > elegant solution, but no other response.
> > 
> > I tried unsuccessfully with the apply() family. Searching the mailing
> > list was not very fruitful either. The closest I got to was a cryptic
> > chunk of code in pairwise.table().
> > 
> > Since I do use something similar myself occasionally, I am hoping
> > someone from the R-help list can suggest alternatives or past threads.
> > Thank you.

<snip>

In follow up to the posts on this last night, I created an updated
version of my function (though I will point out that Gabor's is faster,
as I will show below).

I realized that using the combinations() function had a potential
limitation, which is the limits of R's recursion depth, as Greg mentions
in the help for the function. It will require an adjustment when the
number of columns is about 45.

Thus, I modified the creation of the column combinations as noted below.
I also added some code to verify the input data type and to ensure that
the resultant structures remain matrices in the case of an input matrix
with ncol = 2, in which case, this function is of course, overkill.

Thus:

 pairwise.diffs <- function(x)
{
  stopifnot(is.matrix(x))

  # create column combination pairs
  prs <- cbind(rep(1:ncol(x), each = ncol(x)), 1:ncol(x))
  col.diffs <- prs[prs[, 1] < prs[, 2], , drop = FALSE]

  # do pairwise differences 
  result <- x[, col.diffs[, 1]] - x[, col.diffs[, 2], drop = FALSE]

  # set colnames
  if(is.null(colnames(x)))
    colnames(x) <- 1:ncol(x)

  colnames(result) <- paste(colnames(x)[col.diffs[, 1]], ".vs.", 
                            colnames(x)[col.diffs[, 2]], sep = "")
  result
}


Now to performance. I created a large 1,000 column matrix:

mat <- matrix(sample(100, 10000, replace = TRUE), ncol = 1000)
colnames(mat) <- 1:1000

> str(mat)
 int [1:10, 1:1000] 48 23 26 22 69 64 2 13 13 69 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:1000] "1" "2" "3" "4" ...


Timing:

> gc();system.time(m <- pairwise.diffs(mat))
          used (Mb) gc trigger  (Mb)
Ncells 1541241 41.2    3094291  82.7
Vcells 7139074 54.5   17257300 131.7
[1] 1.14 0.19 1.39 0.00 0.00


> gc();system.time(g <- do.call("cbind", sapply(2:ncol(mat), 
                                    f, mat)))
          used (Mb) gc trigger  (Mb)
Ncells 1541241 41.2    3094291  82.7
Vcells 7139074 54.5   17257300 131.7
[1] 0.81 0.02 0.92 0.00 0.00


Comparisons:

> str(m)
 int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:499500] "1.vs.2" "1.vs.3" "1.vs.4" "1.vs.5" ...


> str(g)
 int [1:10, 1:499500] -47 -43 -35 -29 15 33 -53 -36 -17 57 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:499500] "1-2" "1-3" "1-4" "1-5" ...


> table(m == g)
 
   TRUE
4995000


HTH,

Marc Schwartz




More information about the R-help mailing list