[R] Getting the groupmean for each person

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon May 10 13:52:59 CEST 2004


On Mon, 10 May 2004, Liaw, Andy wrote:

> Both of you might have missed my question from Friday:  For very long `x'
> (e.g., length=50000), indexing by names can take a long time.  See that
> thread for detail.  (For small data, you can hardly tell the difference.)

That's solved in R-devel as of this morning.  You need a million to see a 
significant time in indexing.

However, I think that in this case you should be indexing by the codes of 
a factor, as tapply is guaranteed to produce results in the order of the 
levels of f (after conversion to a factor).  So the natural way to index 
by a factor is the default one.

It may come as no surprise then that lda has code like

    group.means <- tapply(x, list(rep(g, p), col(x)), mean)
            X <- x - group.means[g, ]

where g is a factor.

> Also, I'm trying to write the function in a way that one can pass in more
> than one grouping variables in a list, much like tapply.  The version I
> shown is a simplified version to demonstrate the `problem' I had.  I
> obviously missed the fact that tapply returns 1D array...
> 
> Best,
> Andy
> 
> > From: kjetil at acelerate.com 
> > 
> > On 10 May 2004 at 10:09, Christophe Pallier wrote:
> > 
> > > 
> > > 
> > > Liaw, Andy wrote:
> > > 
> > > >Suppose I
> > > >define the function:
> > > >
> > > >fun <- function(x, f) {
> > > >    m <- tapply(x, f, mean)
> > > >    ans <- x - m[match(f, unique(f))]
> > > >    names(ans) <- names(x)
> > > >    ans
> > > >}
> > > >
> > > >  
> > > >
> > > 
> > > May I ask what is the purpose of match(f,unique(f)) ?
> > > 
> > > To remove the group means, I have be using:
> > > 
> > > x-tapply(x,f,mean)[f]
> > > 
> > > for a while, (and I am now changing to 
> > > x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of
> > 
> > wouldn't 
> >  sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , "-")
> > 
> > be more natural?
> > 
> > Kjetil Halvorsen
> > 
> > > indexing named vectors with factors )
> > > 
> > > The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular
> > > order in the result of tapply, no? It seems a bit dangerous to me.
> > > 
> > > 
> > > Christophe Pallier

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list