[R] Efficient way to find consecutive integers in vector?

Sat Dec 22 02:17:18 CET 2007

Martin Maechler wrote:
>>>>>> "MS" == Marc Schwartz <marc_schwartz at comcast.net>
>>>>>>     on Thu, 20 Dec 2007 16:33:54 -0600 writes:
> 
>     MS> On Thu, 2007-12-20 at 22:43 +0100, Johannes Graumann wrote:
>     >> Hi all,
>     >> 
>     >> Does anybody have a magic trick handy to isolate directly consecutive
>     >> integers from something like this:
>     >> c(1,2,3,4,7,8,9,10,12,13)
>     >> 
>     >> The result should be, that groups 1-4, 7-10 and 12-13 are consecutive
>     >> integers ...
>     >> 
>     >> Thanks for any hints, Joh
> 
>     MS> Not fully tested, but here is one possible approach:
> 
>     >> Vec
>     MS> [1]  1  2  3  4  7  8  9 10 12 13
> 
>     MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
> 
>     >> Breaks
>     MS> [1]  0  4  8 10
> 
>     >> sapply(seq(length(Breaks) - 1), 
>     MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
>     MS> [[1]]
>     MS> [1] 1 2 3 4
> 
>     MS> [[2]]
>     MS> [1]  7  8  9 10
> 
>     MS> [[3]]
>     MS> [1] 12 13
> 
> 
> 
>     MS> For a quick test, I tried it on another vector:
> 
> 
>     MS> set.seed(1)
>     MS> Vec <- sort(sample(20, 15))
> 
>     >> Vec
>     MS> [1]  1  2  3  4  5  6  8  9 10 11 14 15 16 19 20
> 
>     MS> Breaks <- c(0, which(diff(Vec) != 1), length(Vec))
> 
>     >> Breaks
>     MS> [1]  0  6 10 13 15
> 
>     >> sapply(seq(length(Breaks) - 1), 
>     MS> function(i) Vec[(Breaks[i] + 1):Breaks[i+1]])
>     MS> [[1]]
>     MS> [1] 1 2 3 4 5 6
> 
>     MS> [[2]]
>     MS> [1]  8  9 10 11
> 
>     MS> [[3]]
>     MS> [1] 14 15 16
> 
>     MS> [[4]]
>     MS> [1] 19 20
> 
> Seems ok, but ``only works for increasing sequences''.
> More than 12 years ago, I had encountered the same problem and
> solved it like this:
> 
> In package 'sfsmisc', there has been the function  inv.seq(),
> named for "inversion of seq()",
> which does this too, currently returning an expression,
> but returning a call in the development version of sfsmisc:
> 
> Its definition is currently
> 
> inv.seq <- function(i) {
>   ## Purpose: 'Inverse seq': Return a short expression for the 'index'  `i'
>   ## --------------------------------------------------------------------
>   ## Arguments: i: vector of (usually increasing) integers.
>   ## --------------------------------------------------------------------
>   ## Author: Martin Maechler, Date:  3 Oct 95, 18:08
>   ## --------------------------------------------------------------------
>   ## EXAMPLES: cat(rr <- inv.seq(c(3:12, 20:24, 27, 30:33)),"\n"); eval(rr)
>   ##           r2 <- inv.seq(c(20:13, 3:12, -1:-4, 27, 30:31)); eval(r2); r2
>   li <- length(i <- as.integer(i))
>   if(li == 0) return(expression(NULL))
>   else if(li == 1) return(as.expression(i))
>   ##-- now have: length(i) >= 2
>   di1 <- abs(diff(i)) == 1	#-- those are just simple sequences  n1:n2 !
>   s1 <- i[!c(FALSE,di1)] # beginnings
>   s2 <- i[!c(di1,FALSE)] # endings
> 
>   ## using text & parse {cheap and dirty} :
>   mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>   parse(text =
>         paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
>         srcfile = NULL)[[1]]
> }
> 
> with example code
> 
>  > v <- c(1:10,11,6,5,4,0,1)
>  > (iv <- inv.seq(v))
>  c(1:11, 6:4, 0:1)
>  > stopifnot(identical(eval(iv), as.integer(v)))
>  > iv[[2]]
>  1:11
>  > str(iv)
>   language c(1:11, 6:4, 0:1)
>  > str(iv[[2]])
>   language 1:11
>  > 
> 
> 
> Now, given that this stems from  1995,  I should be excused for
> using   parse(text = *)  [see  fortune(106) if you don't understand].
> 
> However, doing this differently by constructing the resulting
> language object directly {using substitute(), as.symbol(),
> 	 		  as.expression() ... etc}
> seems not quite trivial.
> 
> So here's the Friday afternoon /  Christmas break quizz:  
> 
>   What's the most elegant way
>   to replace the last statements in  inv.seq()
>   ------------------------------------------------------------------------
>   ## using text & parse {cheap and dirty} :
>   mkseq <- function(i,j) if(i == j) i else paste(i,":",j, sep="")
>   parse(text =
>         paste("c(", paste(mapply(mkseq, s1,s2), collapse = ","), ")", sep = ""),
> 	      srcfile = NULL)[[1]]
>   ------------------------------------------------------------------------
> 
>   by code that does not use parse (or source() or similar) ???
> 
> I don't have an answer yet, at least not at all an elegant one.
> And maybe, the solution to the quiz is that there is no elegant
> solution.

How about this ? :

 > i <- c(1, 10, 12)
 > j <- c(5, 10, 14)
 > mkseq <- function(i, j) if (i==j) i else call(':', i, j)
 > as.call(c(list(as.name('c')), mapply(i, j, FUN=mkseq)))
c(1:5, 10, 12:14)
 > eval(.Last.value)
[1]  1  2  3  4  5 10 12 13 14
 >

-- Tony Plate

> 
> Martin
> 
> 
>     MS> HTH,
> 
>     MS> Marc Schwartz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>