[R] Selecting First Incidence from Longitudinal Data

Frank Harrell f.harrell at vanderbilt.edu
Sun Feb 24 15:51:16 CET 2013


I think we need a task view on longitudinal data manipulation.  There are so
many approaches to this - people need help navigating them.

I tend to stay away from the lapply-split methods as they don't look quite
as clean and may take longer to run.  The aggregate function uses too much
data frame subscripting.  The plyr package and the mApply function in the
Hmisc package provide some other nice solutions.  Often I like to stick with
tapply using constructs like 

with(mydata, tapply(1:nrow(mydata), subjectID, function(i) {... operate on
variables in mydata subscripted by [i] ...)))

Frank


arun kirshna wrote
> Hi,
> 
> I am not sure why you are getting different results.  I couldn't reproduce
> your problem.
> dat1<- read.table(text=" 
> ID    COMPL  SEX  HEREDITY 
> 1    0      1      2 
> 1    0      1      2 
> 1    3      1      2 
> 2    0      0      1 
> 2    1      0      1 
> 2    2      0      1 
> 2    2      0      1 
> 3    0      0      1 
> 3    0      0      1 
> 3    0      0      1 
> 3    0      0      1 
> 3    2      0      1 
> 4    0      1      2 
> 4    0      1      2 
> ",sep="",header=TRUE)
> do.call(rbind,lapply(split(dat1,dat1$ID),function(x) if(any(x$COMPL!=0))
> head(x[x$COMPL!=0,],1) else head(x,1)))
> #  ID COMPL SEX HEREDITY
> #1  1     3   1        2
> #2  2     1   0        1
> #3  3     2   0        1
> #4  4     0   1        2
> 
> 
> You could also try:
> dat1[with(dat1,ave(COMPL,ID,FUN=function(x) if(any(x!=0)) cumsum(x>0) else
> seq_along(x)))==1,] #modification of David's code
> #   ID COMPL SEX HEREDITY
> #3   1     3   1        2
> #5   2     1   0        1
> #12  3     2   0        1
> #13  4     0   1        2
> A.K.
> 
> 
> 
> 
> 
> ________________________________
> From: Tasnuva Tabassum <

> t.tasnuva@

> >
> To: arun <

> smartpink111@

> > 
> Sent: Sunday, February 24, 2013 12:08 AM
> Subject: Re: [R] Selecting First Incidence from Longitudinal Data
> 
> 
> sorry, I tried this. But it gave me answer:
> 
>  #   ID COMPL SEX HEREDITY 
> #1   1     0   1        2        
> #4   2     0   0        1        
> #8   3     0   0        1        
> #13  4     0   1        2        
> 
> 
> 
> 
> On Sat, Feb 23, 2013 at 8:44 PM, arun <

> smartpink111@

> > wrote:
> 
> Hi,
>>Try this:
>>#dat1
>> do.call(rbind,lapply(split(dat1,dat1$ID),function(x) if(any(x$COMPL!=0))
head(x[x$COMPL!=0,],1) else head(x,1)))
>>
>>#  ID COMPL SEX HEREDITY
>>
>>#1  1     3   1        2
>>#2  2     1   0        1
>>#3  3     2   0        1
>>#4  4     0   1        2
>>A.K.
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: Tasnuva Tabassum <

> t.tasnuva@

> >
>>To: Xiaogang Su <

> xiaogangsu@

> >
>>Cc: arun <

> smartpink111@

> >; R help <

> r-help@

> >; Rui Barradas <

> ruipbarradas@

> >
>>Sent: Saturday, February 23, 2013 11:23 PM
>>
>>Subject: Re: [R] Selecting First Incidence from Longitudinal Data
>>
>>
>>Hi
>>Thank you very much, but I forgot to tell that I also want to include the
patients for which no complication occurred. That is, for my data I want to
include patient no. 4, for which the COMPL value will be 0.
>>
>>In that case, what R function should I write?
>>
>>
>>
>>
>>On Sat, Feb 23, 2013 at 12:23 PM, Xiaogang Su <

> xiaogangsu@

> > wrote:
>>
>>My bad. I didn't try it out with the real data. Here you go. HTH, X
>>>
>>>
>>>dat <- read.table(text="
>>>ID    COMPL  SEX  HEREDITY
>>>1    0      1      2
>>>1    0      1      2
>>>1    3      1      2
>>>2    0      0      1
>>>2    1      0      1
>>>2    2      0      1
>>>2    2      0      1
>>>3    0      0      1
>>>3    0      0      1
>>>3    0      0      1
>>>3    0      0      1
>>>3    2      0      1
>>>4    0      1      2
>>>4    0      1      2
>>>", header = TRUE)
>>>
>>>
>>>dat0 <- dat[dat$COMPL!=0, ]
>>>dat0$sequence <- as.vector(unlist(lapply(aggregate(dat0$ID,
by=list(dat0$ID),FUN=length)$x, FUN=function(x){seq(1, x)})))
>>>dat0 <- dat0[dat0$sequence==1, ] 
>>>dat0
>>>
>>>
>>>
>>>
>>>On Sat, Feb 23, 2013 at 2:09 PM, arun <

> smartpink111@

> > wrote:
>>>
>>>HI,
>>>>Tried your approach:
>>>>
>>>>
>>>> dat1$sequence <- as.vector(unlist(lapply( aggregate(dat1$ID,
by=list(dat1$ID),FUN=length)$x, FUN=function(x){seq(1, x)})))
>>>> dat0 <- dat1[dat1$sequence==1 & dat1$COMPL!= 0, ] #your second solution
>>>> dat0
>>>>#[1] ID       COMPL    SEX      HEREDITY sequence
>>>>#<0 rows> (or 0-length row.names)
>>>> 
>>>>
>>>>dat1[dat1$sequence==1,] #here the OP wanted first incidence where
COMPL!=0
>>>>#   ID COMPL SEX HEREDITY sequence
>>>>#1   1     0   1        2        1
>>>>#4   2     0   0        1        1
>>>>#8   3     0   0        1        1
>>>>#13  4     0   1        2        1
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Xiaogang Su <

> xiaogangsu@

> >
>>>>To: Rui Barradas <

> ruipbarradas@

> >
>>>>Cc: 

> r-help@

>>>>Sent: Saturday, February 23, 2013 2:15 PM
>>>>Subject: Re: [R] Selecting First Incidence from Longitudinal Data
>>>>
>>>>Try this:
>>>>dat$sequence <- as.vector(unlist(lapply( aggregate(dat$ID, by=list(x),
>>>>FUN=length)$x, FUN=function(x){seq(1, x))))
>>>>dat0 <- dat[dat$sequence==1, ]
>>>>
>>>>HTH, X
>>>>
>>>>
>>>>On Sat, Feb 23, 2013 at 1:07 PM, Rui Barradas <

> ruipbarradas@

> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> You can use ?aggregate and ?head to do what you want. Try the
>>>>> following.
>>>>>
>>>>>
>>>>>
>>>>> dat <- read.table(text="
>>>>>
>>>>> ID    COMPL  SEX  HEREDITY
>>>>> 1    0      1      2
>>>>> 1    0      1      2
>>>>> 1    3      1      2
>>>>> 2    0      0      1
>>>>> 2    1      0      1
>>>>> 2    2      0      1
>>>>> 2    2      0      1
>>>>> 3    0      0      1
>>>>> 3    0      0      1
>>>>> 3    0      0      1
>>>>> 3    0      0      1
>>>>> 3    2      0      1
>>>>> 4    0      1      2
>>>>> 4    0      1      2
>>>>> ", header = TRUE)
>>>>>
>>>>> aggregate(. ~ ID, data = subset(dat, COMPL != 0), head, 1)
>>>>>
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>> Em 23-02-2013 14:28, Tasnuva Tabassum escreveu:
>>>>>
>>>>>  I have a longitudinal competing risk data of the form:
>>>>>>
>>>>>> ID    COMPL  SEX   HEREDITY
>>>>>> 1     0       1      2
>>>>>> 1     0       1      2
>>>>>> 1     3       1      2
>>>>>> 2     0       0      1
>>>>>> 2     1       0      1
>>>>>> 2     2       0      1
>>>>>> 2     2       0      1
>>>>>> 3     0       0      1
>>>>>> 3     0       0      1
>>>>>> 3     0       0      1
>>>>>> 3     0       0      1
>>>>>> 3     2       0      1
>>>>>> 4     0       1      2
>>>>>> 4     0       1      2.
>>>>>>
>>>>>> Where, COMPL= health complication of diabetic patients which has
>>>>>> value
>>>>>> labels   as  0= no complication,1=coronary heart disease,
>>>>>> 2=retinopathy,
>>>>>> 3=
>>>>>> nephropathy.
>>>>>>
>>>>>>
>>>>>> I want to select only the first complication that occurred to each
>>>>>> patient.
>>>>>> What R function can I use?
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________**________________
>>>>>> 

> R-help@

>  mailing list
>>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>>> PLEASE do read the posting guide http://www.R-project.org/**
>>>>>> posting-guide.html
>>>>>> <http://www.R-project.org/posting-guide.html>
>>>>
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>> ______________________________**________________
>>>>> 

> R-help@

>  mailing list
>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> PLEASE do read the posting guide http://www.R-project.org/**
>>>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>>>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>==============================
>>>>Xiaogang Su, Ph.D.
>>>>Associate Professor & Statistician
>>>>School of Nursing, University of Alabama
>>>>Birmingham, AL 35294-1210
>>>>(205) 934-2355 [Office]
>>>>

> xgsu@

>>>>

> xiaogangsu@

>>>>https://sites.google.com/site/xgsu00/
>>>>
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>

> R-help@

>  mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>>
>>>--
>>>==============================
>>>Xiaogang Su, Ph.D.
>>>Associate Professor & Statistician
>>>School of Nursing, University of Alabama
>>>Birmingham, AL 35294-1210
>>>(205) 934-2355 [Office]
>>>

> xgsu@

>>>

> xiaogangsu@

>  
>>>https://sites.google.com/site/xgsu00/
>>
> 
> ______________________________________________

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/Selecting-First-Incidence-from-Longitudinal-Data-tp4659455p4659530.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list