[R] numbering consecutive rows based on length criteria

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Mar 2 19:02:07 CET 2015


Dear Eric,

Here is a solution using the plyr package.

library(plyr)
dat$flg <- dat$wRes.Q>dat$noRes.Q
dat$group <- cumsum(c(0, abs(diff(dat$flg))))
ddply(dat, "group", function(x){
  if(x$flg[1] && nrow(x) >= 5){
    x$plygn <- seq_along(x$group)
  } else {
    x$plygn <- NA
  }
  x
})

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-03-02 18:43 GMT+01:00 Morway, Eric <emorway op usgs.gov>:

> Using this dataset:
>
> dat <- read.table(textConnection("day    noRes.Q    wRes.Q
>  1  237074.41 215409.41
>  2 2336240.20 164835.16
>  3   84855.42 357062.72
>  4   76993.48 386326.78
>  5   73489.47 307144.09
>  6   70246.96  75885.75
>  7   69630.09  74054.33
>  8   66714.78  70071.80
>  9  122296.90  66579.08
> 10   63502.71  65811.37
> 11   63401.84  64795.12
> 12   63387.84  64401.14
> 13   63186.10  64163.95
> 14   63160.74  63468.25
> 15   60471.15  60719.15
> 16   58235.63  57655.14
> 17   58089.73  58061.34
> 18   57846.39  57357.89
> 19   57839.42  56495.69
> 20   57740.06  56219.97
> 21   58068.57  55810.91
> 22   58358.34  56437.81
> 23   76284.90  73722.92
> 24  105138.31 100729.00
> 25  147203.03 178079.38
> 26  109996.02 111113.95
> 27   91424.20  87391.56
> 28   89065.91  87196.69
> 29   86628.74  84809.07
> 30   79357.60  77555.62"),header=T)
>
> I'm attempting to generate a column that continuously numbers consecutive
> rows where wRes.Q is greater than noRes.Q.  To that end, I've come up with
> the following:
>
> dat$flg <- dat$wRes.Q>dat$noRes.Q
> dat$cnt <- with(dat, ave(integer(length(flg)), flg, FUN=seq_along))
>
> The problem with dat$cnt is that it doesn't start over with 1 when a 'new'
> group of either true or false is encountered.  Thus, row 9's cnt value
> should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc.
> (the desired result is shown below)
>
> In the larger dataset I'm working with (>6,000 rows), there are blocks of
> rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100.
> My goal is to plot these blocks of rows as polygons in a time series plot.
> If, for the small example provided, the number of consecutive rows with
> dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows
> satisfying this criteria in this small example are rows 3-8 and 10-15), is
> there a way to add a column that uniquely numbers these blocks of rows? I'd
> like to end up with the following, which shows the correct "cnt" column and
> a column called "plygn" that is my ultimate goal:
>
> dat
> # day    noRes.Q    wRes.Q   flg cnt  plygn
> #   1  237074.41 215409.41 FALSE   1     NA
> #   2 2336240.20 164835.16 FALSE   2     NA
> #   3   84855.42 357062.72  TRUE   1      1
> #   4   76993.48 386326.78  TRUE   2      1
> #   5   73489.47 307144.09  TRUE   3      1
> #   6   70246.96  75885.75  TRUE   4      1
> #   7   69630.09  74054.33  TRUE   5      1
> #   8   66714.78  70071.80  TRUE   6      1
> #   9  122296.90  66579.08 FALSE   1     NA
> #  10   63502.71  65811.37  TRUE   1      2
> #  11   63401.84  64795.12  TRUE   2      2
> #  12   63387.84  64401.14  TRUE   3      2
> #  13   63186.10  64163.95  TRUE   4      2
> #  14   63160.74  63468.25  TRUE   5      2
> #  15   60471.15  60719.15  TRUE   6      2
> #  16   58235.63  57655.14 FALSE   1     NA
> #  17   58089.73  58061.34 FALSE   2     NA
> #  18   57846.39  57357.89 FALSE   3     NA
> #  19   57839.42  56495.69 FALSE   4     NA
> #  20   57740.06  56219.97 FALSE   5     NA
> #  21   58068.57  55810.91 FALSE   6     NA
> #  22   58358.34  56437.81 FALSE   7     NA
> #  23   76284.90  73722.92 FALSE   8     NA
> #  24  105138.31 100729.00 FALSE   9     NA
> #  25  147203.03 178079.38  TRUE   1     NA
> #  26  109996.02 111113.95  TRUE   2     NA
> #  27   91424.20  87391.56 FALSE   1     NA
> #  28   89065.91  87196.69 FALSE   2     NA
> #  29   86628.74  84809.07 FALSE   3     NA
> #  30   79357.60  77555.62 FALSE   4     NA
>
> Thanks, Eric
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list