[R] subsets

Ivan Calandra ivan.calandra at uni-hamburg.de
Thu Jan 20 14:39:04 CET 2011


Hi Taras,

Indeed, I've overlooked the problem. Anyway, I'm not sure I would have 
been able to give a complete answer like you did!

Ivan

Le 1/20/2011 11:05, Taras Zakharko a écrit :
> Hello Den,
>
> your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have
> more then one diagnosis and I take that you want to isolate patients based on particular conditions.
> Thus, simply looking for "ah" or "idh" as Ivan suggests will yield patients which can have either of those but not
> necessarily patients that have both.
>
> Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient.
> I think that its done best with the aggregate function. This function splits the data according to some
> factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be
> a condition test):
>
>
> ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&&   "ihd" %in% x)
> ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&&   !"ihd" %in% x)
> ids<- aggregate(diagnosis ~ id, df, function(x) ! "ah" %in% x&&   "ihd" %in% x)
>
> Now, ids will contain a data frame like:
>
> id	diagnosis
> 1	TRUE
> 2	FALSE
> 3	FALSE
> ...
>
> which shows which patients have the set of diagnoses you asked for. You can then apply these
> patients to the original data by something like:
>
> subset(df, id %in% subset(ids, diagnosis == TRUE)$id)
>
> this will extract only patients from the 'ids' data frame  for which  the diagnosis applies and then extract the associated
> diagnosis sets from the original 'df' data frame.
>
> Hope it helps,
>
> Taras
> On Jan 20, 2011, at 9:53 , Den wrote:
>
>> Dear R people
>> Could you please help.
>>
>> Basically, there are two variables in my data set. Each patient ('id')
>> may have one or more diseases ('diagnosis'). It looks like
>>
>> id	diagnosis
>> 1	ah
>> 2	ah
>> 2	ihd
>> 2	im
>> 3	ah
>> 3	stroke
>> 4	ah
>> 4	ihd
>> 4	angina
>> 5	ihd
>> ..............
>> Q: How to make three data sets:
>> 	1. Patients with ah and ihd
>> 	2. Patients with ah but no ihd
>> 	3. Patients with  ihd but no ah?
>>
>> If you have any ideas could just guide what should I look for. Is a
>> subset or aggregate, or loops, or something else??? I am a bit lost. (F1
>> F1 F1 !!!:)
>> Thank you
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list