# [R] Column-mean-values for targeted rows

Benilton Carvalho bcarvalh at jhsph.edu
Sat Jul 21 00:29:44 CEST 2007

```set.seed(123)
N = 30000
K = 400
theData = matrix(rnorm(N*K), ncol=K)
theData = as.data.frame(theData)
theData = cbind(indicator = sample(0:1, N, rep=T), theData)

> system.time(results <- colMeans(subset(theData, indicator == 1)))
user  system elapsed
2.309   1.319   3.853

b

On Jul 20, 2007, at 6:17 PM, Diogo Alagador wrote:

> Hi all,
>
> I'm handling massive data.frames and matrices in R (30000 x 400).
> In the 1st column, say, I have 0s and 1s indicating rows that
> matter; other columns have probability values.
> One simple task I would like to do would be to get the column mean
> values for signaled rows (the ones with 1)
> As a very fresh "programmer" I have build a simple function in R
> which should not be very efficient indeed! It works well for
> current-dimension matrices, but it just not goes so well in huge ones.
>
> meanprob<-function(Robj){
> NLINE<-dim(Robj);
> NCOLUMN<-dim(Robj);
> mprob<-c(rep(0,(NCOLUMN-1)));
> for (i in 2:NCOLUMN){
>     sumprob<-0;
>     pa<-0;
>     for (j in 1:NLINE){
>         if(Robj[j,1]!=0){
>             pa<-pa+1;
>             sumprob<-Robj[j,i]+sumprob;
>         }
>     }
>     mprob[i-1]<-sumprob/pa;
> }
> return(mprob);
> }
>
>
> So I "only" see 3 ways to get through the problem:
>
> - to reformulate the function to gain efficiency;
> - to establish a C-routine (for example), where loops are more
> "speedy", and then interfacing with R;
> - to find some function/ package that already do that.
>
> Can anybody illuminate my way here,
>
> Mush thanks,
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help