[R] how to get the group mean deviation data ?

Mon Jul 25 10:32:20 CEST 2005

Yeah,I meant n=10000,but i just missed a zero.

If n=10000,t=3,it take about 3 seconds.
If n=2000,t=7,it takes about 10 seconds.
I want to write a function to fit a model,and the data maybe quite large (maybe n<=20000,t<=10).When n and t become larger and larger,the time will be much longer. It is of course reasonable.But I think there should be much pretty code to do this job,so I post here.

What I really want to konw is how to optimize the code for this purpose.Of course, I can still fit my model even I use this code.and I still like R much as it's free ,flexibile and powerfull.

>> if n id quite large,say n=1000 and t=3, it require too much time.so i 
>> want to know any more efficient way to do it?
>
>Why is about 0.4 second (which is what it takes on my system) too long?
>
>Given that you want to operate on 3000 cells, a second does not look 
>unreasonable.
>
>This is a toy problem, and it is unclear what the real problem is (if 
>any).  Since you have the same number of replications for each cell 
>(group-variable combination)

I want to deal with the case with different number of replications for each cell too.

> I would use this as a n x 3 x t array (a 
>simple call to dim and aperem).  Then rowMeans will find the group means, 
>and you can just subtract those to get the deviations from the means, 
>making use of recycling.
>
>E.g.
>
>D <- d[,-1]
>dim(D) <- c(t,n,3)
>D <- aperm(D, c(2,3,1))
>gmeans <- rowMeans(D, dims=2)
>d[,-1] - rep(gmeans, each=3)
>
>That takes under 10ms for n=1000
>
>
>On Mon, 25 Jul 2005, ronggui wrote:
>
>>> n=10;t=3
>>> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
>>> head(d)
>>     id          y           x          z
>> [1,]  1 -2.1725379  0.07629954 -0.3985258
>> [2,]  1 -1.2383038 -2.49667038  0.6966127
>> [3,]  1 -1.2642401 -0.50613307  0.4895856
>> [4,]  2  0.2171246  0.86711864 -0.6660036
>> [5,]  2  2.2765760 -0.48547142 -1.4496664
>> [6,]  2  0.5985345 -1.06427035  2.1761071
>>
>> first,i want to get the group mean of each variable,which i can use
>>> d<-data.frame(d)
>>> aggregate(d,list(d$id),mean)[,-1]
>>   id           y          x           z
>> 1   1 -1.55836060 -0.9755013  0.26255754
>> 2   2  1.03074502 -0.2275410  0.02014565
>> 3   3  0.20700121 -0.7159450  1.35890176
>> 4   4  0.17839650  1.2575891  0.04135165
>> 5   5 -0.20012508  0.4310221  0.55458899
>> 6   6 -0.13084185 -0.2953392  0.28229068
>> 7   7  0.20737288 -0.8863761 -0.50793880
>> 8   8  0.07512612 -0.6591304 -0.21656533
>> 9   9  0.94727796 -0.6108891  0.13529884
>> 10 10 -0.04434875  0.1332086 -0.88229808
>>
>> then i want the  group mean deviation data,like
>>> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))
>>              y          x          z
>> [1,] -0.6141773  1.0518008 -0.6610833
>> [2,]  0.3200568 -1.5211691  0.4340552
>> [3,]  0.2941205  0.4693682  0.2270281
>> [4,] -0.8136205  1.0946597 -0.6861493
>> [5,]  1.2458310 -0.2579304 -1.4698121
>> [6,] -0.4322105 -0.8367293  2.1559614
>>
>> both above are what i want.though i can do it use the function  to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it?
>>
>> myfun<-function(x,id)
>> {
>> x<-as.matrix(x)
>> id<-as.factor(id)
>> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
>> xdm<- x[] <- x-xm[id,]
>> re<-list(xm=xm, xdm=xdm)
>> re
>> }
>>
>>
>
>-- 
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595