[R] How to remove rows based on frequency of factor and then difference date scores

Abhijit Dasgupta, PhD aikidasgupta at gmail.com
Tue Aug 24 19:59:33 CEST 2010


The only problem with this is that Chris's unique individuals are a 
combination of Type and ID, as I understand it. So Type=A, ID=1 is a 
different individual from Type=B,ID=1. So we need to create a unique 
identifier per person, simplistically by uniqueID=paste(Type, ID, 
sep=''). Then, using this new identifier, everything follows.

On 08/24/2010 01:53 PM, David Winsemius wrote:
>
> On Aug 24, 2010, at 1:19 PM, Chris Beeley wrote:
>
>> Hello-
>>
>> A basic question which has nonetheless floored me entirely. I have a
>> dataset which looks like this:
>>
>> Type  ID     Date            Value
>> A       1    16/09/2020       8
>> A       1     23/09/2010      9
>> B       3     18/8/2010        7
>> B       1     13/5/2010        6
>>
>> There are two Types, which correspond to different individuals in
>> different conditions, and loads of ID labels (1:50) corresponding to
>> the different individuals in each condition, and measurements at
>> different times (from 1 to 10 measurements) for each individual.
>>
>> I want to perform the following operations:
>>
>> 1) Delete all individuals for whom only one measurement is available.
>> In the dataset above, you can see that I want to delete the row Type B
>> ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
>> because there is more than one measurement for Type A ID 1 (but not
>> for Type B ID1)
>>
>> 2) Produce difference scores for each of the Dates, so each individual
>> (Type A ID1 and all the others for whom more than one measurement
>> exists) starts at Date "1" and goes up in integers according to how
>> many days have elapsed.
>>
>> I just know there's some incredibly cunning R-ish way of doing this
>> but after many hours of fiddling I have had to admit defeat.
>
> Not sure about terribly cunning. Let's assume your dataframe was read 
> in with stringsAsFactors=FALSE and is called txt.df:
>
>
> > txt.df$dt2 <- as.Date(txt.df$Date, format="%d/%m/%Y")
> > txt.df
>   Type ID       Date Value        dt2
> 1    A  1 16/09/2020     8 2020-09-16
> 2    A  1 23/09/2010     9 2010-09-23
> 3    B  3  18/8/2010     7 2010-08-18
> 4    B  1  13/5/2010     6 2010-05-13
>
> > txt.df$nn <- ave(txt.df$ID,txt.df$ID, FUN=length)
> > txt.df
>   Type ID       Date Value        dt2 nn
> 1    A  1 16/09/2020     8 2020-09-16  3
> 2    A  1 23/09/2010     9 2010-09-23  3
> 3    B  3  18/8/2010     7 2010-08-18  1
> 4    B  1  13/5/2010     6 2010-05-13  3
> > txt.df[ -which( txt.df$nn <=1), ]
>   Type ID       Date Value        dt2 nn
> 1    A  1 16/09/2020     8 2020-09-16  3
> 2    A  1 23/09/2010     9 2010-09-23  3
> 4    B  1  13/5/2010     6 2010-05-13  3
>
> # Task #1 accomplished
>
> > tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x)
> $`1`
> Time differences in days
> [1]    0 3646 3779
>
> $`3`
> Time difference of 0 days
>
> > unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) )
>   11   12   13    3
>    0 3646 3779    0
> > txt.df$diffdays <- unlist( tapply(txt.df$dt2, txt.df$ID, function(x) 
> x[1] -x) )
> > txt.df
>   Type ID       Date Value        dt2 nn diffdays
> 1    A  1 16/09/2020     8 2020-09-16  3        0
> 2    A  1 23/09/2010     9 2010-09-23  3     3646
> 3    B  3  18/8/2010     7 2010-08-18  1     3779
> 4    B  1  13/5/2010     6 2010-05-13  3        0
> >
>
>
>
>>
>> I would be very grateful for any words of advice.
>>
>> Many thanks,
>> Chris Beeley,
>> Institute of Mental Health, UK
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgupta at araastat.com
W: http://www.araastat.com



More information about the R-help mailing list