[R] Re: coding factor replicates

Bill.Venables@CMIS.CSIRO.AU Bill.Venables at CMIS.CSIRO.AU
Thu Jan 24 04:34:07 CET 2002



>  -----Original Message-----
> From: 	Douglas Bates [mailto:bates at stat.wisc.edu] 
> Sent:	Thursday, January 24, 2002 8:55 AM
> To:	Uwe Ligges
> Cc:	Brad Buchsbaum; r-help at stat.math.ethz.ch
> Subject:	Re: [R] Re: coding factor replicates
> 
> Douglas Bates <bates at cs.wisc.edu> writes:
> 
> > Uwe Ligges <ligges at statistik.uni-dortmund.de> writes:
> > 
> > > Brad Buchsbaum wrote:
> > > > 
> > > > Hi All,
> > > > 
> > > > If I have a factor f:
> > > > 
> > > > A B C B C A C B A A B ....
> > > > 
> > > > and I would like to generate a factor to indicate the trial number
> > > > as a function of condition: e.g.
> > > > 
> > > > 1 1 1 2 2 2 3 3 3 4 4 ...
> > > > 
> > > > how might I attack this in R?
> > > 
> > > What about something like
> > >   as.factor(outer(rep(1, 3), 1:4))
> > 
> > I think the point is that the 1's are at the first occurence of the
> > level, the 2's at the second occurence, etc.  This seems like the sort
> > of problem that Bill Venables would come up with a devilishly clever
> > way of solving.
> > 
> > I would do it as
> > 
> > > result <- seq(along = f)         # create an vector to hold the
response
> > > sp <- split(seq(along = f), f)   # split the factor on levels
> > > result[unlist(sp)] <- unlist(lapply(sp, function(x) seq(along = x)))
> > > result
> >  [1] 1 1 1 2 2 2 3 3 3 4 4
> > 
> > but I'm sure Bill would do it much more elegantly than that.
> 
> Before others point out the obvious simplification (I did it in stages
> and assembled the "swish" result, as Bill would term it - apparently
> swish has a different connotation in Australia than it does in North
> America), the second line could be
> 
> > sp <- split(result, f)   # split the index vector on factor levels

Doug is much too kind (I think).  The tricks with match() I have learned
from him are just amazing.

With this problem you can cheat a bit if you assume that the trials are
contiguous (as I think they must be).  All you need to know then are (1) the
run length of a trial and (2) the number of trials.

> run.length <- which(duplicated(f))[1] - 1
> no.trials <- ceiling(length(f)/run.length)
> trials <- factor(rep(1:no.trials, rep(run.length, no.trials), 
			length.out = length(f)))
> trials
 [1] 1 1 1 2 2 2 3 3 3 4 4
Levels:  1 2 3 4 

No more elegant than Doug's, I contend!

Bill Venables.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list