[R] rle with data.table - is it possible?

Kate Ignatius kate.ignatius at gmail.com
Thu Jan 1 07:58:51 CET 2015


correct code:

childseg<-0
x:=sumchild <-0
span<-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]<-rep(seq_along(span), times = span)
childseg[childseg == 0]<-''

On Thu, Jan 1, 2015 at 1:56 AM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
> Is it possible to add the following code or similar in data.table:
>
> childseg<-0
> x:=sumchild <-0
> span<-rle(x)$lengths[rle(x)$values==TRUE
> childseg[x]<-rep(seq_along(span), times = spanLOH)
> childseg[childseg == 0]<-''
>
> I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and
> child.  The problem I'm having is with the
> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
> be added to data.table.
>
>
> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
>> I do not understand the value of using the rle function in your description,
>> but the code below appears to produce the table you want.
>>
>> Note that better support for the data.table package might be found at
>> stackexchange as the documentation specifies.
>>
>> x <- read.table( text=
>> "Dad Mum Child Group
>> AA RR RA A
>> AA RR RR A
>> AA AA AA B
>> AA AA AA B
>> RA AA RR B
>> RR AA RR B
>> AA AA AA B
>> AA AA RA C
>> AA AA RA C
>> AA RR RA C
>> ", header=TRUE, stringsAsFactors=FALSE )
>>
>> library(data.table)
>> DT <- data.table( x )
>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>> DT[ , sumdad := 0L ]
>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>> DT[ , cdad := NULL ]
>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>> DT[ , summum := 0L ]
>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>> DT[ , cmum := NULL ]
>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>> DT[ , sumchild := 0L ]
>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>> DT[ , cchild := NULL ]
>>
>>> DT
>>
>>     Dad Mum Child Group sumdad summum sumchild
>>  1:  AA  RR    RA     A      2      2        0
>>  2:  AA  RR    RR     A      2      2        1
>>  3:  AA  AA    AA     B      4      5        5
>>  4:  AA  AA    AA     B      4      5        5
>>  5:  RA  AA    RR     B      0      5        5
>>  6:  RR  AA    RR     B      4      5        5
>>  7:  AA  AA    AA     B      4      5        5
>>  8:  AA  AA    RA     C      3      3        0
>>  9:  AA  AA    RA     C      3      3        0
>> 10:  AA  RR    RA     C      3      3        0
>>
>>
>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>
>>> I'm trying to use both these packages and wondering whether they are
>>> possible...
>>>
>>> To make this simple, my ultimate goal is determine long stretches of
>>> 1s, but I want to do this within groups (hence using the data.table as
>>> I use the "set key" option.  However, I'm I'm not having much luck
>>> making this possible.
>>>
>>> For example, for simplistic sake, I have the following data:
>>>
>>> Dad Mum Child Group
>>> AA RR RA A
>>> AA RR RR A
>>> AA AA AA B
>>> AA AA AA B
>>> RA AA RR B
>>> RR AA RR B
>>> AA AA AA B
>>> AA AA RA C
>>> AA AA RA C
>>> AA RR RA  C
>>>
>>> And the following code which I know works
>>>
>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>
>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>
>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>
>>> However, I wish to do the above code by Group (though this file is
>>> millions of rows long and groups will be larger but just wanted to
>>> simply the example).
>>>
>>> I did something like this but of course I got an error:
>>>
>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>
>>> The reason being as I want to eventually have something like this:
>>>
>>> Dad Mum Child Group sumdad summum sumchild
>>> AA RR RA A 2 2 0
>>> AA RR RR A 2 2 1
>>> AA AA AA B 4 5 5
>>> AA AA AA B 4 5 5
>>> RA AA RR B 0 5 5
>>> RR AA RR B 4 5 5
>>> AA AA AA B 4 5 5
>>> AA AA RA C 3 3 0
>>> AA AA RA C 3 3 0
>>> AA RR RA  C 3 3 0
>>>
>>> That is, I would like to have the specific counts next to what I'm
>>> consecutively counting per group.  So for Group A for dad there are 2
>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child and
>>> that is RR (so the 1 is next to the RR and not the RA).
>>>
>>> Can this be done?
>>>
>>> K.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>>                                       Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>> ---------------------------------------------------------------------------



More information about the R-help mailing list