[R] the first and last observation for each subject

William Dunlap wdunlap at tibco.com
Mon Jan 5 18:18:31 CET 2009


Arg, the 'sapply(...)' in the function was in the initial
comment,
   gm <- function(x, group){ # medians by group:
sapply(split(x,group),median)
but someone's mailer put a newline before the sapply 
   gm <- function(x, group){ # medians by group:
   sapply(split(x,group),median)
so it got executed, wasting all that time.  (Of course the
same mailer will mess up this message in the same way -
just delete the sapply() call from gm().)

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -----Original Message-----
> From: hadley wickham [mailto:h.wickham at gmail.com] 
> Sent: Monday, January 05, 2009 9:10 AM
> To: William Dunlap
> Cc: gallon.li at gmail.com; R help
> Subject: Re: [R] the first and last observation for each subject
> 
> > Another application of that technique can be used to quickly compute
> > medians by groups:
> >
> > gm <- function(x, group){ # medians by group:
> > sapply(split(x,group),median)
> >   o<-order(group, x)
> >   group <- group[o]
> >   x <- x[o]
> >   changes <- group[-1] != group[-length(group)]
> >   first <- which(c(TRUE, changes))
> >   last <- which(c(changes, TRUE))
> >   lowerMedian <- x[floor((first+last)/2)]
> >   upperMedian <- x[ceiling((first+last)/2)]
> >   median <- (lowerMedian+upperMedian)/2
> >   names(median) <- group[first]
> >   median
> > }
> >
> > For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups
> > (in random order) the times are:
> >
> >> group<-sample(1:30000, size=100000, replace=TRUE)
> >> x<-rnorm(length(group))*10 + group
> >> unix.time(z0<-sapply(split(x,group), median))
> >   user  system elapsed
> >   2.72    0.00    3.20
> >> unix.time(z1<-gm(x,group))
> >   user  system elapsed
> >   0.12    0.00    0.16
> >> identical(z1,z0)
> > [1] TRUE
> 
> I get:
> 
> > unix.time(z0<-sapply(split(x,group), median))
>    user  system elapsed
>   2.733   0.017   2.766
> > unix.time(z1<-gm(x,group))
>    user  system elapsed
>   2.897   0.032   2.946
> 
> 
> Hadley
> 
> 
> -- 
> http://had.co.nz/
> 




More information about the R-help mailing list