[R] finding and describing missing data runs in a time series

(Ted Harding) Ted.Harding at wlandres.net
Mon Feb 13 09:51:26 CET 2012


On 13-Feb-2012 Durant, James T. (ATSDR/DTEM/PRMSB) wrote:
> Hi -
> I am trying to find and describe missing data in a time series.
> For instance, in the library openair, there is a data frame
> called "mydata":
> library(openair)
> head(mydata)
> 
>   date   ws  wd nox no2 o3 pm10    so2      co pm25
> 1 1998-01-01 00:00:00 0.60 280 285  39  1   29 4.7225  3.3725   NA
> 2 1998-01-01 01:00:00 2.16 230  NA  NA NA   37     NA      NA   NA
> 3 1998-01-01 02:00:00 2.76 190  NA  NA  3   34 6.8300  9.6025   NA
> 4 1998-01-01 03:00:00 2.16 170 493  52  3   35 7.6625 10.2175   NA
> 5 1998-01-01 04:00:00 2.40 180 468  78  2   34 8.0700  8.9125   NA
> 6 1998-01-01 05:00:00 3.00 190 264  42  0   16 5.5050  3.0525   NA
> 
> 
> So for example, I would like to be able to detect for pm25,
> I would like to be able to detect that there are NA's starting
> at 1998-01-01 0:00:00 and runs for 2887 hourly observations.
> Then I would be able to know that there is an NA at 2910 and
> so on. The key information I am looking for is when the NA's
> start and their length. The closest thing I can use that I
> know about is timePlot in the openair package with
> statistic="frequency" but it only gives monthly summary data,
> and does not tell me if the missing data are clumped together
> or are dispersed.
> 
> VR
> Jim
> 
> James T. Durant, MSPH CIH
> Emergency Response Coordinator
> US Agency for Toxic Substances and Disease Registry
> Atlanta, GA 30341
> 770-378-1695

You might consider an approach based on

  rle(is.na(mydata$pm25))

See ?rle

Example:

  X <- c(1,2,3,NA,NA,NA,4,5,NA,6,7,8,NA,NA,NA,NA,NA)
  X
  # [1]  1  2  3 NA NA NA  4  5 NA  6  7  8 NA NA NA NA NA
  rle(is.na(X))
  # Run Length Encoding
  #   lengths: int [1:6] 3 3 2 1 3 5
  #   values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 13-Feb-2012  Time: 08:51:19
This message was sent by XFMail



More information about the R-help mailing list