[R] rle with data.table - is it possible?

Kate Ignatius kate.ignatius at gmail.com
Fri Jan 2 02:07:00 CET 2015


Apologies - mix up of syntax all over the place, a habit of mine.  The
last line was in there because of code beforehand so it really doesn't
need to be there.  Here is the proper code I hope:

childseg<-0
x<-sumchild ==0
span<-rle(x)$lengths[rle(x)$values==TRUE]
childseg[x]<-rep(seq_along(span), times = span)


On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>
> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote:
>>Is it possible to add the following code or similar in data.table:
>>
>>childseg<-0
>>x:=sumchild <-0
>>span<-rle(x)$lengths[rle(x)$values==TRUE
>>childseg[x]<-rep(seq_along(span), times = span)
>>childseg[childseg == 0]<-''
>>
>>I was hoping to do this code by Group for mum, dad and
>>child.  The problem I'm having is with the
>>span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>be added to data.table.
>>
>>[Previous email had incorrect code]
>>
>>On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>><jdnewmil at dcn.davis.ca.us> wrote:
>>> I do not understand the value of using the rle function in your
>>description,
>>> but the code below appears to produce the table you want.
>>>
>>> Note that better support for the data.table package might be found at
>>> stackexchange as the documentation specifies.
>>>
>>> x <- read.table( text=
>>> "Dad Mum Child Group
>>> AA RR RA A
>>> AA RR RR A
>>> AA AA AA B
>>> AA AA AA B
>>> RA AA RR B
>>> RR AA RR B
>>> AA AA AA B
>>> AA AA RA C
>>> AA AA RA C
>>> AA RR RA C
>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>
>>> library(data.table)
>>> DT <- data.table( x )
>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>> DT[ , sumdad := 0L ]
>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>> DT[ , cdad := NULL ]
>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>> DT[ , summum := 0L ]
>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>> DT[ , cmum := NULL ]
>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>> DT[ , sumchild := 0L ]
>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>> DT[ , cchild := NULL ]
>>>
>>>> DT
>>>
>>>     Dad Mum Child Group sumdad summum sumchild
>>>  1:  AA  RR    RA     A      2      2        0
>>>  2:  AA  RR    RR     A      2      2        1
>>>  3:  AA  AA    AA     B      4      5        5
>>>  4:  AA  AA    AA     B      4      5        5
>>>  5:  RA  AA    RR     B      0      5        5
>>>  6:  RR  AA    RR     B      4      5        5
>>>  7:  AA  AA    AA     B      4      5        5
>>>  8:  AA  AA    RA     C      3      3        0
>>>  9:  AA  AA    RA     C      3      3        0
>>> 10:  AA  RR    RA     C      3      3        0
>>>
>>>
>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>
>>>> I'm trying to use both these packages and wondering whether they are
>>>> possible...
>>>>
>>>> To make this simple, my ultimate goal is determine long stretches of
>>>> 1s, but I want to do this within groups (hence using the data.table
>>as
>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>> making this possible.
>>>>
>>>> For example, for simplistic sake, I have the following data:
>>>>
>>>> Dad Mum Child Group
>>>> AA RR RA A
>>>> AA RR RR A
>>>> AA AA AA B
>>>> AA AA AA B
>>>> RA AA RR B
>>>> RR AA RR B
>>>> AA AA AA B
>>>> AA AA RA C
>>>> AA AA RA C
>>>> AA RR RA  C
>>>>
>>>> And the following code which I know works
>>>>
>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>
>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>
>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>
>>>> However, I wish to do the above code by Group (though this file is
>>>> millions of rows long and groups will be larger but just wanted to
>>>> simply the example).
>>>>
>>>> I did something like this but of course I got an error:
>>>>
>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>
>>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>
>>>> The reason being as I want to eventually have something like this:
>>>>
>>>> Dad Mum Child Group sumdad summum sumchild
>>>> AA RR RA A 2 2 0
>>>> AA RR RR A 2 2 1
>>>> AA AA AA B 4 5 5
>>>> AA AA AA B 4 5 5
>>>> RA AA RR B 0 5 5
>>>> RR AA RR B 4 5 5
>>>> AA AA AA B 4 5 5
>>>> AA AA RA C 3 3 0
>>>> AA AA RA C 3 3 0
>>>> AA RR RA  C 3 3 0
>>>>
>>>> That is, I would like to have the specific counts next to what I'm
>>>> consecutively counting per group.  So for Group A for dad there are
>>2
>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>and
>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>
>>>> Can this be done?
>>>>
>>>> K.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>



More information about the R-help mailing list