[BioC] mean of individual rows for subsets of columns in an ExprSet

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Wed Aug 20 12:25:25 MEST 2003

> HI Anna,
>
> I had a similar problem myself and didn't come up with an easy way to solve it, so my approach was
>
> res <- matrix(nrow=length(rownames(x)),ncol=2, byrow=T)
> for (i in 1:length(rownames(x))){
>    res[i,1] <- ((x[i,4]+x[i,5]+x[i,6])+x[i,7])/4)
>    res[i,2] <- #calculate the SD here
> }

i could not tell from the e-mail whether column statistics or
row statistics were intended.  for row statistics, commands
of the form apply(x,1,f) can be used.  if f returns a scalar
on vector input (as does the function mean) then apply(x,1,f)
is the vector with ith element f(x[i,]).  you could then
use apply(x[,4:7],1,mean) to do the first calculation above
(and could easily modify to median or trimmed mean with this
approach).

if you want to be a little more elegant, you can write
a function that returns the vector of statistics of interest

msd <- function(x) c(mean(x),sqrt(var(x)))

now

apply(x,1,msd)

returns a 2xn matrix where n is the number of rows of x.

msdmat <- t(apply(x,1,msd))

lessons: use apply and R functions whenever feasible.

>
>
> you could probably set this up as a function allowing you to select different columns each time or if can assign your different columns to groups (maybe assign those you want mean and SD for as 1 and those you don't as 0) you could do something like
>
> groups <- c(0,0,0,1,1,1,1)
> calc.means <-function(x, y){
>    by(x, y, mean)
> }
> apply(eset at exprs,1 calc.means, y=groups)

this can be done in one step using subscripting within
the apply

msdmat <- t(apply(x[,groups==1],1,msd))
# or specify the groups explicitly in the subscripting

NB: please don't use the "@" notation if it can be avoided.
we provide "accessor" function exprs() that should be used.