[R] 'Record' row values every time the binary value in a collumn changes

jim holtman jholtman at gmail.com
Wed Apr 20 18:59:27 CEST 2011


Here is an answer to part 1:

> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> start<-c(12,17,18,20,25,36,12,15,16,17,19)
> Table<-cbind(Chromosome,start,binary)
> # determine where the start/end of each group is
> # use indices since the size is large
> startEnd <- lapply(split(seq(nrow(Table))
+                      , list(Table[, "Chromosome"], Table[, 'binary'])
+                      , drop = TRUE
+                      )
+                   , function(.indx){
+     se <- range(.indx)
+     c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
+       , position_start = unname(Table[se[1L], 'start'])
+       , position_end = unname(Table[se[2L], 'start'])
+       , binary2 = unname(Table[se[1L], 'binary'])
+       )
+ })
> do.call(rbind, startEnd)
    Chromosome2 position_start position_end binary2
1.0           1             20           36       0
2.0           2             17           19       0
1.1           1             12           18       1
2.1           2             12           16       1
>
>


On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
> My question is twofold.
>
> Part 1:
> My data looks like this:
>
> (example set, real data has 2*10^6 rows)
> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> start<-c(12,17,18,20,25,36,12,15,16,17,19)
> Table<-cbind(Chromosome,start,binary)
>      Chromosome start binary
>  [1,]          1    12      1
>  [2,]          1    17      1
>  [3,]          1    18      1
>  [4,]          1    20      0
>  [5,]          1    25      0
>  [6,]          1    36      0
>  [7,]          2    12      1
>  [8,]          2    15      1
>  [9,]          2    16      1
> [10,]          2    17      0
> [11,]          2    19      0
>
> As output I need a shortlist for each binary block: giving me the starting
> and ending position of each block.
> Which for these example would look like this:
>     Chromosome2 position_start position_end binary2
> [1,]           1             12           18       1
> [2,]           1             20           36       0
> [3,]           2             12           16       1
> [4,]           2             17           19       0
>
> Part 2:
> Based on the output of part 1, I need to assign the binary to rows of
> another data set. If the position value in this second data set falls in one
> of the blocks defined in the shortlist made in part1,the binary value of the
> shortlist should be assigned to an extra column for this row.  This would
> look something like this:
>     Chromosome3 position Value binary3
>  [1,] "1"         "12"     "a"   "1"
>  [2,] "1"         "13"     "b"   "1"
>  [3,] "1"         "14"     "c"   "1"
>  [4,] "1"         "15"     "d"   "1"
>  [5,] "1"         "16"     "e"   "1"
>  [6,] "1"         "18"     "f"   "1"
>  [7,] "1"         "20"     "g"   "0"
>  [8,] "1"         "21"     "h"   "0"
>  [9,] "1"         "22"     "i"   "0"
> [10,] "1"         "23"     "j"   "0"
> [11,] "1"         "25"     "k"   "0"
> [12,] "1"         "35"     "l"   "0"
> [13,] "2"         "12"     "m"   "1"
> [14,] "2"         "13"     "n"   "1"
> [15,] "2"         "14"     "o"   "1"
> [16,] "2"         "15"     "p"   "1"
> [17,] "2"         "16"     "q"   "1"
> [18,] "2"         "17"     "s"   "0"
> [19,] "2"         "18"     "d"   "0"
> [20,] "2"         "19"     "f"   "0"
>
>
> Many thanks in advance,
>
> Niels
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list