[R] 'Record' row values every time the binary value in a collumn changes

Phil Spector spector at stat.berkeley.edu
Wed Apr 20 19:44:53 CEST 2011


Here's one way to do part 1:

> rr = rle(Table[,'binary'])
> cc = cumsum(rr$lengths)+1
> thestarts =  c(1,cc[cc<=nrow(Table)])
> theends = cc-1
> answer = cbind(Table[thestarts,'Chromosome'],Table[thestarts,'start'],Table[theends,'start'],rr$values)
> answer
      [,1] [,2] [,3] [,4]
[1,]    1   12   18    1
[2,]    1   20   36    0
[3,]    2   12   16    1
[4,]    2   17   19    0

If I understand you correctly, here's a way to do part 2:

> Next = matrix(c(rep(1,12),rep(2,8),c(12,13,14,15,16,18,20,21,22,23,25,35,12,13,14,15,16,17,18,19)),ncol=2)
> apply(Next,1,function(x)answer[answer[,1]==x[1] & x[2] >= answer[,2] & x[2] <= answer[,3],4])
  [1] 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu











> On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
>> My question is twofold.
>>
>> Part 1:
>> My data looks like this:
>>
>> (example set, real data has 2*10^6 rows)
>> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
>> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
>> start<-c(12,17,18,20,25,36,12,15,16,17,19)
>> Table<-cbind(Chromosome,start,binary)
>>      Chromosome start binary
>>  [1,]          1    12      1
>>  [2,]          1    17      1
>>  [3,]          1    18      1
>>  [4,]          1    20      0
>>  [5,]          1    25      0
>>  [6,]          1    36      0
>>  [7,]          2    12      1
>>  [8,]          2    15      1
>>  [9,]          2    16      1
>> [10,]          2    17      0
>> [11,]          2    19      0
>>
>> As output I need a shortlist for each binary block: giving me the starting
>> and ending position of each block.
>> Which for these example would look like this:
>>     Chromosome2 position_start position_end binary2
>> [1,]           1             12           18       1
>> [2,]           1             20           36       0
>> [3,]           2             12           16       1
>> [4,]           2             17           19       0
>>
>> Part 2:
>> Based on the output of part 1, I need to assign the binary to rows of
>> another data set. If the position value in this second data set falls in one
>> of the blocks defined in the shortlist made in part1,the binary value of the
>> shortlist should be assigned to an extra column for this row.  This would
>> look something like this:
>>     Chromosome3 position Value binary3
>>  [1,] "1"         "12"     "a"   "1"
>>  [2,] "1"         "13"     "b"   "1"
>>  [3,] "1"         "14"     "c"   "1"
>>  [4,] "1"         "15"     "d"   "1"
>>  [5,] "1"         "16"     "e"   "1"
>>  [6,] "1"         "18"     "f"   "1"
>>  [7,] "1"         "20"     "g"   "0"
>>  [8,] "1"         "21"     "h"   "0"
>>  [9,] "1"         "22"     "i"   "0"
>> [10,] "1"         "23"     "j"   "0"
>> [11,] "1"         "25"     "k"   "0"
>> [12,] "1"         "35"     "l"   "0"
>> [13,] "2"         "12"     "m"   "1"
>> [14,] "2"         "13"     "n"   "1"
>> [15,] "2"         "14"     "o"   "1"
>> [16,] "2"         "15"     "p"   "1"
>> [17,] "2"         "16"     "q"   "1"
>> [18,] "2"         "17"     "s"   "0"
>> [19,] "2"         "18"     "d"   "0"
>> [20,] "2"         "19"     "f"   "0"
>>
>>
>> Many thanks in advance,
>>
>> Niels
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


More information about the R-help mailing list