[R] LOCF - Last Observation Carried Forward

Gabor Grothendieck ggrothendieck at myway.com
Sat Nov 15 04:21:25 CET 2003


From: Tony Plate <tplate at acm.org>:
>
> Here's a function that does the essential computation (written to work in 
> both S-plus and R).
> 
> This looks like one of those tricky problems that do not vectorize 
> easily. It would be simple to write a C-program to compute this very 
> efficiently. But are there any more efficient solutions than ones like the 
> below (that are written without resort to C)?
> 
> most.recent <- function(x) {
> # return a vector of indices of the most recent TRUE value
> if (!is.logical(x))
> stop("x must be logical")
> x[is.na(x)] <- FALSE
> # x is a logical vector
> r <- rle(x)
> ends <- cumsum(r$lengths)
> starts <- ends - r$lengths + 1
> spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, 
> value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)]))))
> names(spec) <- NULL
> unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else 
> rep(s[4], len=s[2])), use.names=F)
> }
> 
> > x <- c(F,T,T,F,F,F,T,F)
> > most.recent(x)
> [1] NA 2 3 3 3 3 7 7
> 
> And using it to do the fill-forward:
> 
> > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA)
> > x[most.recent(!is.na(x))]
> [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8
> >
> 
> Some timings:
> 
> > x <- sample(c(T,F),1e4,rep=T)
> > system.time(most.recent(x))
> [1] 0.33 0.01 0.47 NA NA
> > x <- sample(c(T,F),1e5,rep=T)
> > system.time(most.recent(x))
> [1] 4.27 0.06 6.44 NA NA
> > x <- sample(c(T,F),1e6,rep=T)
> > system.time(most.recent(x))
> [1] 47.27 0.17 47.97 NA NA
> >
> 
> -- Tony Plate
> 
> PS. Actually, I just found a solution that I had lying around that is about 
> 70 times as fast on random test data like the above.

I was waiting for you to post this but didn't see it so I thought 
I would post mine.  This one is 13x as fast and only requires 
a single line of code.  

> set.seed(111)
> x <- sample(c(T,F),10000,rep=T)

> system.time(z1 <- most.recent(x))
[1] 0.92 0.02 1.68   NA   NA

> system.time(z2 <- as.numeric(as.vector(
     cut(seq(x),c(which(x),Inf),lab=which(x),right=F))))
[1] 0.07 0.00 0.12   NA   NA

> all.equal(z1,z2)
[1] TRUE




More information about the R-help mailing list