[R] rle with data.table - is it possible?

Kate Ignatius kate.ignatius at gmail.com
Tue Dec 30 15:27:37 CET 2014


I'm trying to use both these packages and wondering whether they are possible...

To make this simple, my ultimate goal is determine long stretches of
1s, but I want to do this within groups (hence using the data.table as
I use the "set key" option.  However, I'm I'm not having much luck
making this possible.

For example, for simplistic sake, I have the following data:

Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA  C

And the following code which I know works

hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]

hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]

hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]

However, I wish to do the above code by Group (though this file is
millions of rows long and groups will be larger but just wanted to
simply the example).

I did something like this but of course I got an error:

LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

The reason being as I want to eventually have something like this:

Dad Mum Child Group sumdad summum sumchild
AA RR RA A 2 2 0
AA RR RR A 2 2 1
AA AA AA B 4 5 5
AA AA AA B 4 5 5
RA AA RR B 0 5 5
RR AA RR B 4 5 5
AA AA AA B 4 5 5
AA AA RA C 3 3 0
AA AA RA C 3 3 0
AA RR RA  C 3 3 0

That is, I would like to have the specific counts next to what I'm
consecutively counting per group.  So for Group A for dad there are 2
AAs,  there are two RRs for mum but only 1 AA or RR for the child and
that is RR (so the 1 is next to the RR and not the RA).

Can this be done?

K.



More information about the R-help mailing list