[R] LOCF - Last Observation Carried Forward

Marc Schwartz MSchwartz at medanalytics.com
Fri Nov 14 18:00:01 CET 2003


karlknoblich at yahoo.de wrote:
> Hi!
>  
> Is there a possibilty in R to carry out LOCF (Last Observation Carried
> Forward) analysis or to create a new data frame (array, matrix) with
> LOCF? Or some helpful functions, packages?
>  
> Karl


As I understand the methodology and potential issues regarding the
imputation of data for the missing observations, I have a couple of
thoughts:

1. The missing observation data can be imputed where missing using
standard R data management functions. The complexity or lack of it will
likely depend upon your exact data structure. 

For example, if the missing values are all NA's, you can use
vector/matrix indexing to replace them based upon various conditions. If
the subsetting logic is more complex, you can use the replace()
function, which enables you to specify a complex boolean construct. See
?replace for more information.

If your data (x) is sequenced left to right in a time series vector, you
can identify the position of the last known observation for example:

> x <- c(23, 25, 24, NA, 25, NA, NA)
> max(which(!is.na(x)))
[1] 5

and fill to the right, repeating the last known data:

> LOCF <- max(which(!is.na(x)))
> x[LOCF:length(x)] <- x[LOCF]
> x
[1] 23 25 24 NA 25 25 25

A quick search on Google raises some known issues with the methodology
depending upon the nature of the missing data and what sort of
assumptions you are willing to make or live with. 

For more complex imptation, there are a variety of missing data
imputation functions available for R, for example in Frank Harrell's
Design and Hmisc packages on CRAN.


2. Another alternative to consider, depending upon how much missing data
you are dealing with and its etiology, would be an unbalanced mixed
effects approach using the model functions in package 'nlme'.  I might
defer to others here, but something to consider.

HTH,

Marc Schwartz




More information about the R-help mailing list