[R] Read 2 rows in 1 dataframe for diff - longitudinal data

David Winsemius dwinsemius at comcast.net
Tue Jun 4 17:13:57 CEST 2013


On Jun 3, 2013, at 9:51 PM, arun wrote:

> If it is grouped by "subid" (that would be the difference in the number of changes)
> 
> subset(ddply(df1,.(subid),mutate,delta=c(FALSE,var[-1]!=var[-length(var)])),delta)[,-4]
> #   subid year var
> #3     36 2003   3
> #7     47 2001   3
> #9     47 2005   1
> #10    47 2007   3
> A.K.

I'm not sure why the first one retruns integer values from the ave() call but the second version works:

> df1[ ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ), ]
    subid year var
1      36 1999   1
1.1    36 1999   1
1.2    36 1999   1
1.3    36 1999   1

ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]))
 [1] 0 0 1 0 0 0 1 0 1 1

Perhaps one of the single item groups sabotaged my simple function.


> df1[ as.logical( ave( df1$var, df1$subid, FUN=function(x) c( FALSE, x[-1] != x[-length(x)]) ) ), ]
   subid year var
3     36 2003   3
7     47 2001   3
9     47 2005   1
10    47 2007   3

-- 
David.
> 
> 
> ----- Original Message -----
> From: David Winsemius <dwinsemius at comcast.net>
> To: arun <smartpink111 at yahoo.com>
> Cc: R help <r-help at r-project.org>
> Sent: Tuesday, June 4, 2013 12:37 AM
> Subject: Re: [R] Read 2 rows in 1 dataframe for diff - longitudinal data
> 
> 
> On Jun 3, 2013, at 7:10 PM, arun wrote:
> 
>> Hi,
>> May be this helps:
>> res1<-df1[with(df1,unlist(tapply(var,list(subid),FUN=function(x) c(FALSE,diff(x)!=0)),use.names=FALSE)),]
>>   res1
>> #   subid year var
>> #3     36 2003   3
>> #7     47 2001   3
>> #9     47 2005   1
>> #10    47 2007   3
>> #or
>> library(plyr)
>>   subset(ddply(df1,.(subid),mutate,delta=c(FALSE,diff(var)!=0)),delta)[,-4]
>> #   subid year var
>> #3     36 2003   3
>> #7     47 2001   3
>> #9     47 2005   1
>> #10    47 2007   3
>> A.K.
>> 
> It's pretty simple with logical indexing:
> 
>> df1[ c(FALSE, df1$var[-1]!=df1$var[-length(df1$var)]), ]
>    subid year var
> 3     36 2003   3
> 6     47 1999   1
> 7     47 2001   3
> 9     47 2005   1
> 10    47 2007   3
> 
> 
> When I count the number of changes in value of var is give me 5. Not sure why you are both leaving out row 6.
> 
> -- 
> David.
>> 
>> 
>> I need to output a dataframe whenever var changes a value. 
>> 
>> df1 <- data.frame(subid=rep(c(36,47),each=5),year=rep(seq(1999,2007,2),2),var=c(1,1,3,3,3,1,3,3,1,3)) 
>>     subid year var 
>> 1     36 1999   1 
>> 2     36 2001   1 
>> 3     36 2003   3 
>> 4     36 2005   3 
>> 5     36 2007   3 
>> 6     47 1999   1 
>> 7     47 2001   3 
>> 8     47 2003   3 
>> 9     47 2005   1 
>> 10    47 2007   3 
>>> 
>> 
>> I need: 
>> 36 2003   3 
>> 47 2001   3 
>> 47 2005   1 
>> 47 2007   3 
>> 
>> I am trying to use ddply over subid and use the diff function, but it is not working quiet right. 
>> 
>>> dd <- ddply(df1,.(subid),summarize,delta=diff(var) != 0) 
>>> dd 
>>    subid delta 
>> 1    36 FALSE 
>> 2    36  TRUE 
>> 3    36 FALSE 
>> 4    36 FALSE 
>> 5    47  TRUE 
>> 6    47 FALSE 
>> 7    47  TRUE 
>> 8    47  TRUE 
>> 
>> I would appreciate any help on this. 
>> Thank You! 
>> -ST
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list