[R] Locate Patients who have multiple high blood pressure readings

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jan 31 20:20:00 CET 2013


On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
> On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
>
>> Hi,
>>
>>
>>
>> I have a new question about subsetting in R.
>>
>>
>>
>> Say we have this data frame:
>>
>>
>>
>>     PT_ID Blood_Pressure OBS_TYPE
>>
>> 92   1900      90.0      DBP
>>
>> 94   1900      90.0      DBP
>>
>> 174  2900     140.0      SBP
>>
>> 176  2900     130.0      SBP
>>
>> 180  3900     120.0      SBP
>>
>> 268  3900     150.0      SBP
>>
>> 268  3900      90.0      DBP
>>
>>
>>
>> I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
>>
>>
>>
>> PT_ID=1900, he has 2 DBP>=90, so he will be included.
>>
>> PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
>>
>> PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
>> included.
>>
>>
>>
>> So, the condition requires TWO OR MORE values higher than the threshold.
>> It could be either SBP or DBP or both of them.
>>
>>
>>
>> I have tried ddply, but I don’t know how to add the condition 2+ inside
>> ddply.
>>

This can be specified in a reasonably natural fashion using SQL. Here
DF is the input data frame.:

> library(sqldf)
> sqldf("select
+       PT_ID,
+       sum(Blood_Pressure >= 90 and OBS_TYPE == 'DBP') DBP,
+       sum(Blood_Pressure >= 140 and OBS_TYPE == 'SBP') SBP
+    from DF
+    group by PT_ID
+    having DBP >= 2 or SBP >= 2")
  PT_ID DBP SBP
1  1900   2   0



More information about the R-help mailing list