[R] numbering consecutive rows based on length criteria

Morway, Eric emorway at usgs.gov
Mon Mar 2 18:43:55 CET 2015


Using this dataset:

dat <- read.table(textConnection("day    noRes.Q    wRes.Q
 1  237074.41 215409.41
 2 2336240.20 164835.16
 3   84855.42 357062.72
 4   76993.48 386326.78
 5   73489.47 307144.09
 6   70246.96  75885.75
 7   69630.09  74054.33
 8   66714.78  70071.80
 9  122296.90  66579.08
10   63502.71  65811.37
11   63401.84  64795.12
12   63387.84  64401.14
13   63186.10  64163.95
14   63160.74  63468.25
15   60471.15  60719.15
16   58235.63  57655.14
17   58089.73  58061.34
18   57846.39  57357.89
19   57839.42  56495.69
20   57740.06  56219.97
21   58068.57  55810.91
22   58358.34  56437.81
23   76284.90  73722.92
24  105138.31 100729.00
25  147203.03 178079.38
26  109996.02 111113.95
27   91424.20  87391.56
28   89065.91  87196.69
29   86628.74  84809.07
30   79357.60  77555.62"),header=T)

I'm attempting to generate a column that continuously numbers consecutive
rows where wRes.Q is greater than noRes.Q.  To that end, I've come up with
the following:

dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))

The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
group of either true or false is encountered.  Thus, row 9's cnt value
should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
(the desired result is shown below)

In the larger dataset I'm working with (>6,000 rows), there are blocks of
rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
My goal is to plot these blocks of rows as polygons in a time series plot.
If, for the small example provided, the number of consecutive rows with
dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
satisfying this criteria in this small example are rows 3-8 and 10-15), is
there a way to add a column that uniquely numbers these blocks of rows? I'd
like to end up with the following, which shows the correct "cnt" column and
a column called "plygn" that is my ultimate goal:

dat
# day    noRes.Q    wRes.Q   flg cnt  plygn
#   1  237074.41 215409.41 FALSE   1     NA
#   2 2336240.20 164835.16 FALSE   2     NA
#   3   84855.42 357062.72  TRUE   1      1
#   4   76993.48 386326.78  TRUE   2      1
#   5   73489.47 307144.09  TRUE   3      1
#   6   70246.96  75885.75  TRUE   4      1
#   7   69630.09  74054.33  TRUE   5      1
#   8   66714.78  70071.80  TRUE   6      1
#   9  122296.90  66579.08 FALSE   1     NA
#  10   63502.71  65811.37  TRUE   1      2
#  11   63401.84  64795.12  TRUE   2      2
#  12   63387.84  64401.14  TRUE   3      2
#  13   63186.10  64163.95  TRUE   4      2
#  14   63160.74  63468.25  TRUE   5      2
#  15   60471.15  60719.15  TRUE   6      2
#  16   58235.63  57655.14 FALSE   1     NA
#  17   58089.73  58061.34 FALSE   2     NA
#  18   57846.39  57357.89 FALSE   3     NA
#  19   57839.42  56495.69 FALSE   4     NA
#  20   57740.06  56219.97 FALSE   5     NA
#  21   58068.57  55810.91 FALSE   6     NA
#  22   58358.34  56437.81 FALSE   7     NA
#  23   76284.90  73722.92 FALSE   8     NA
#  24  105138.31 100729.00 FALSE   9     NA
#  25  147203.03 178079.38  TRUE   1     NA
#  26  109996.02 111113.95  TRUE   2     NA
#  27   91424.20  87391.56 FALSE   1     NA
#  28   89065.91  87196.69 FALSE   2     NA
#  29   86628.74  84809.07 FALSE   3     NA
#  30   79357.60  77555.62 FALSE   4     NA

Thanks, Eric

	[[alternative HTML version deleted]]



More information about the R-help mailing list