[R] Re: [S] tapply for matrices

Frank E Harrell Jr fharrell at virginia.edu
Thu Oct 10 21:50:57 CEST 2002

Tony Plate provided what seems to be a very fast and elegant solution - see below.  I have modified his solution slightly:

mapply <- function(X, INDEX, FUN=NULL, ..., simplify=TRUE) {
## Matrix tapply
## X: matrix with n rows; INDEX: vector or list of vectors of length n
## FUN: function to operate on submatrices of x by INDEX
## ...: arguments to FUN; simplify: see sapply
## Modification of code by Tony Plate <tplate at blackmesacapital.com> 10Oct02
idx.list <- tapply(seq(nrow(X)), INDEX, c)
sapply(idx.list, function(idx,x,fun,...) fun(x[idx,,drop=FALSE],...),
       x=X, fun=FUN, ..., simplify=simplify)

Example: mapply(x, groups, quantile, probs=c(.25,.5)) will create a matrix of first and second quartiles of submatrices of x grouped by groups.

The usages I have for this right now are certain within-subject bivariate summaries when subjects have multiple rows of data.

Thanks Tony,


P.S.  Dave Krantz <dhk at paradox.psych.columbia.edu> reported that he wrote a function mtapply that uses for loops for this but that pays a lot of attention to formatting the output as an array with sensible dimnames.

On Thu, 10 Oct 2002 12:51:54 -0600
Tony Plate <tplate at blackmesacapital.com> wrote:

> I use the following idiom for this:
> idx.list <- tapply(seq(numRows(x)), x[,grouping.variable], c)
> lapply(idx.list, function(idx, x) {
>     submatrix <- x[idx,,drop=F]
>     ... operate on submatrix ...
> }, x)
> which seems pretty fast.  I sometimes sort x beforehand so that rows with 
> the same value of the grouping variable are adjacent.
> Hope this helps,
> Tony Plate
> PS. Please excuse me if the above code has any typos -- it's from memory.
> At 02:31 PM 10/10/2002 -0400, you wrote:
> >Does anyone have something like tapply that is extremely fast for matrices 
> >when there is a very large number of levels of the grouping variable?
> >I'm referring to, for example,
> >
> >tapply(x, grouping.variable, function.operating.on.submatrix)
> >
> >where x is a matrix and the submatrix is a subset of the rows of x.  The 
> >grouping variable's length equals the number of rows of x.
> >--

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list