[R] Numbering sequences of non-NAs in a vector

Tue Jul 7 23:53:31 CEST 2009

On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:

> Greetings, I have a vector of the form:
> [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...]  That is, a  
> combination
> of sequences of non-missing values and missing values, with each  
> sequence
> possibly of a different length.
>
> I'd like to create another vector which will help me pick out the  
> sequences
> of non-missing values.  For the example above, this would be:
> [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...].  The goal  
> ultimately is
> to calculate means separately for each sequence.
>
> Your help is appreciated.  If I'm making this more complicated than
> necessary, I'd appreciate knowing that as well!
>
> Many thanks.

Here is one possibility:

Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)

 > Vec
  [1] 10  8  1  3  0  8 NA NA NA NA  2  1  6 NA NA NA  0  5  1  9

Use rle() to get the runs of NA and non-NA values. See ?rle

Runs <- rle(is.na(Vec))

 > Runs
Run Length Encoding
   lengths: int [1:5] 6 4 3 3 4
   values : logi [1:5] FALSE TRUE FALSE TRUE FALSE

Create grouping values for each run:

Grps <- rep(seq(length(Runs$lengths)), Runs$lengths)

 > Grps
  [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5

Now get the means for each run, split by Grps. See ?aggregate

 > aggregate(Vec, list(Grps = Grps), mean)
   Grps    x
1    1 5.00
2    2   NA
3    3 3.00
4    4   NA
5    5 3.75

If you don't want the NA runs included in the result, you could use  
subset():

 > subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x))
   Grps    x
1    1 5.00
3    3 3.00
5    5 3.75

HTH,

Marc Schwartz