[R] extract data features from subsets

Dennis Murphy djmuser at gmail.com
Tue Jun 7 08:36:12 CEST 2011


Hi:

Here's one way using package plyr and its ddply() function. ddply()
takes a data frame as input and expects to output either a scalar or a
data frame. In this case, we want the latter.

library(plyr)
f <- function(df) {
    mn <- min(df$result)
    tms <- df$time[df$result == mn]
    subdf <- df[max(tms):nrow(df), ]
    b1 <- coef(lm(result ~ time, data = subdf))[2]
    data.frame(NadirLevel = mn, NadirFirstTime = min(tms),
NadirLastTime = max(tms), Slope = b1)
  }

This function takes a data frame df as input - in practice, it will be
a sub-data frame associated with a level of ID. We find the minimum of
result and assign it to mn, and then find the times that match the
minimum.
Next, we construct the subdata on which to run the simple linear
regression line. Finally, an output data frame is created. ddply()
will add in the ID variable. Calling your example data frame d,

> ddply(d, 'ID', f)
  ID NadirLevel NadirFirstTime NadirLastTime Slope
1  A          1              3             5     1
2  B          2              2             2     2

HTH,
Dennis


On Mon, Jun 6, 2011 at 10:04 PM, Williams Scott
<Scott.Williams at petermac.org> wrote:
> I have a large dataset similar to this:
>
> ID      time    result
> A       1       5
> A       2       2
> A       3       1
> A       4       1
> A       5       1
> A       6       2
> A       7       3
> A       8       4
> B       1       3
> B       2       2
> B       3       4
> B       4       6
> B       5       8
>
> I need to extract a number of features for each individual in it (identified by "ID"). These are:
> * The lowest result (the nadir)
> * The time of the nadir - but if the nadir level is present at >1 time point, I need the minimum and maximum time of nadir
> * For the time period from maximum time of nadir to the last result, I need the coefficient from a lm(result~time)
>
> The result would be a table looking like:
>
> ID      NadirLevel      NadirFirstTime  NadirLastTime   Slope
> A       1               3                       5                       1
> B       2               2                       2                       2
>
> I can manage to extract all the required elements in a very cumbersome loop, but I am sure an elegant method using apply() or the like could be devised but I cant presently understand the necessary syntax. An suggestions appreciated.
>
> Thanks
> Scott
> _____________________________
>
> Dr. Scott Williams
> Peter MacCallum Cancer Centre
> Melbourne, Australia
> ph +61 3 9656 1111
> fax +61 3 9656 1424
> scott.williams at petermac.org
>
>
>
> This email (including any attachments or links) may contain
> confidential and/or legally privileged information and is
> intended only to be read or used by the addressee.  If you
> are not the intended addressee, any use, distribution,
> disclosure or copying of this email is strictly
> prohibited.
> Confidentiality and legal privilege attached to this email
> (including any attachments) are not waived or lost by
> reason of its mistaken delivery to you.
> If you have received this email in error, please delete it
> and notify us immediately by telephone or email.  Peter
> MacCallum Cancer Centre provides no guarantee that this
> transmission is free of virus or that it has not been
> intercepted or altered and will not be liable for any delay
> in its receipt.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list