[R] algorithm help

William Dunlap wdunlap at tibco.com
Fri Jan 7 00:52:47 CET 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of array chip
> Sent: Thursday, January 06, 2011 3:29 PM
> To: ted.harding at wlandres.net
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] algorithm help
> 
> Thanks very much, Ted. Yes, it does what I need!
> 
> I made a routine to do this:
> 
> f.fragment<-function(a,b) {
>     dat<-as.data.frame(cbind(a,b))
> 
>     L <- rle(dat$a)$lengths
>     V <- rle(dat$a)$values
>     pos <- c(1,cumsum(L))
>     V1 <- c(-1,V)
>     start<-1+pos[V1==0]
>     end<-pos[V1==1]
>  
>     cbind(stretch=1:length(start),start=dat$b[start]
>      ,end=dat$b[end],no.of.1s=L[V==1])
> 
> }
> 
> f.fragment(dat$a,dat$b)
> 
>      stretch start end no.of.1s
> [1,]       1    13  20        4
> [2,]       2    34  46        2
> [3,]       3    49  77        4
> [4,]       4    97  97        1

You need to be more careful about the first
and last rows in the dataset.  I think yours
only works when a starts with 0 and ends with 1.

  > f.fragment(c(1,1,0,0), c(11,12,13,14))
       stretch start end no.of.1s
  [1,]       1    NA  12        2
  > f.fragment(c(1,1,0,1), c(11,12,13,14))
       stretch start end no.of.1s
  [1,]       1    14  12        2
  [2,]       1    14  14        1
  > f.fragment(c(0,1,0,1), c(11,12,13,14))
       stretch start end no.of.1s
  [1,]       1    12  12        1
  [2,]       2    14  14        1
  > f.fragment(c(0,1,0,0), c(11,12,13,14))
       stretch start end no.of.1s
  [1,]       1    12  12        1
  [2,]       2    NA  12        1
  > f.fragment(c(1,1,1,1), c(11,12,13,14))
       stretch end no.of.1s
  [1,]       1  14        4
  [2,]       0  14        4
  > f.fragment(c(0,0,0,0), c(11,12,13,14))
       stretch start
  [1,]       1    NA

The following does better.  It keeps things as
logical vectors as long as possible, which tends
to work better when dealing with runs.
  f <- function(a, b) {
       isFirstIn1Run <- c(TRUE, a[-1] != a[-length(a)]) & a==1
       isLastIn1Run <- c(a[-1] != a[-length(a)], TRUE) & a==1
       data.frame(stretch=seq_len(sum(isFirstIn1Run)),
                  start = b[isFirstIn1Run],
                  end = b[isLastIn1Run],
                  no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run)
+ 1)
  }
  > f(c(1,1,0,0), c(11,12,13,14))
    stretch start end no.of.1s
  1       1    11  12        2
  > f(c(1,1,0,1), c(11,12,13,14))
    stretch start end no.of.1s
  1       1    11  12        2
  2       2    14  14        1
  > f(c(0,1,0,1), c(11,12,13,14))
    stretch start end no.of.1s
  1       1    12  12        1
  2       2    14  14        1
  > f(c(0,1,0,0), c(11,12,13,14))
    stretch start end no.of.1s
  1       1    12  12        1
  > f(c(1,1,1,1), c(11,12,13,14))
    stretch start end no.of.1s
  1       1    11  14        4
  > f(c(0,0,0,0), c(11,12,13,14))
  [1] stretch  start    end      no.of.1s
  <0 rows> (or 0-length row.names)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 


> 
> John
> 
> 
> 
> 
> ________________________________
> From: "ted.harding at wlandres.net" <ted.harding at wlandres.net>
> 
> Cc: r-help at stat.math.ethz.ch
> Sent: Thu, January 6, 2011 2:57:47 PM
> Subject: RE: [R] algorithm help
> 
> On 06-Jan-11 22:16:38, array chip wrote:
> > Hi, I am seeking help on designing an algorithm to identify the
> > locations of stretches of 1s in a vector of 0s and 1s. Below is
> > an simple example:
> > 
> >> 
> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
> >   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> > 
> >> dat
> >    a  b
> > 1  0  4
> > 2  0 12
> > 3  1 13
> > 4  1 16
> > 5  1 18
> > 6  1 20
> > 7  0 28
> > 8  0 30
> > 9  1 34
> > 10 1 46
> > 11 0 47
> > 12 1 49
> > 13 1 61
> > 14 1 73
> > 15 1 77
> > 16 0 84
> > 17 0 87
> > 18 0 90
> > 19 0 95
> > 20 1 97
> > 
> > In this dataset, "b" is sorted and denotes the location for each
> > number in "a". 
> > So I would like to find the starting & ending locations for each
> > stretch of 1s within "a", also counting the number of 1s in each
> > stretch as well.
> > Hope the results from the algorithm would be:
> > 
> > stretch   start   end   No.of.1s
> > 1         13      20    4
> > 2         34      46    2
> > 3         49      77    4
> > 4         97      97    1
> > 
> > I can imagine using for loops can do the job, but I feel it's not a
> > clever way to do this. Is there an efficient algorithm that can do
> > this fast?
> > 
> > Thanks for any suggestions.
> > John
> 
> The basic information you need can be got using rle() ("run length
> encoding"). See '?rle'. In your example:
> 
>   rle(dat$a)
>   # Run Length Encoding
>   #   lengths: int [1:8] 2 4 2 2 1 4 4 1
>   #   values : num [1:8] 0 1 0 1 0 1 0 1
>   ## Note: F -> 0, T -> 1
> 
> The following has a somewhat twisted logic at the end, and may
> [[elided Yahoo spam]]
> 
>   L <- rle(dat$a)$lengths
>   V <- rle(dat$a)$values
>   pos <- c(1,cumsum(L))
>   V1 <- c(-1,V)
>   1+pos[V1==0]
>   # [1]  3  9 12 20
>   ## Positions in the series dat$a where each run of "T" (i.e. 1)
>   ##   starts
> 
> Hoping this helps,
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 06-Jan-11                                       Time: 22:57:44
> ------------------------------ XFMail ------------------------------
> 
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list