[R] R_closest date

Rui Barradas ruipbarradas at sapo.pt
Sat Sep 1 20:17:21 CEST 2012


Hello,

Try the following.

dat <- read.table(text="
  PT_ID     IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
13   4549 2002-08-21 2002-08-20        -1       183        2
14   4549 2002-08-21 2002-11-14        85        91        1
15   4549 2002-08-21 2003-02-18       181        89        1
16   4549 2002-08-21 2003-05-15       267       109        2
17   4549 2002-08-21 2003-12-16       482        96        1
128  4839 2006-11-28 2006-11-28         0       179        2
", header=TRUE)

spl <- split(dat, dat$PT_ID)
idx <- sapply(spl, function(x) which.min(x$DAYS_DIFF))
res <- lapply(names(idx), function(nm) spl[[ nm ]][ idx[nm], ])
do.call(rbind, res)

And assign the return value of do.call to your result (reuse 'res').

Hope this helps,

Rui Barradas
Em 01-09-2012 18:10, WANG WEIJIA escreveu:
> Hi,
>
> I have encountered an issue about finding a date closest to another date
>
> So this is how the data frame looks like:
>
>      PT_ID     IDX_DT   OBS_DATE DAYS_DIFF OBS_VALUE CATEGORY
> 13   4549 2002-08-21 2002-08-20        -1       183        2
> 14   4549 2002-08-21 2002-11-14        85        91        1
> 15   4549 2002-08-21 2003-02-18       181        89        1
> 16   4549 2002-08-21 2003-05-15       267       109        2
> 17   4549 2002-08-21 2003-12-16       482        96        1
> 128  4839 2006-11-28 2006-11-28         0       179        2
>
> I need to find, the single observation, which has the closest date of 'OBS_DATE' to 'IDX_DT'.
>
> For example, for 'PT_ID' of 4549, I need row 13, of which the OBS_DATE is just one day away from IDX_DT.
>
> I was thinking about using abs(), and I got this:
>
> baseline<- function(x){
> +
> +  #remove all uncessary variables
> +  baseline<- x[,c("PT_ID","DAYS_DIFF")]
> +
> +  #get a list of every unique ID
> +  uniqueID <- unique(baseline$PT_ID)
> +
> +  #make a vector that will contain the smallest DAYS_DIFF
> +  first <- rep(-99,length(uniqueID))
> +
> +  i = 1
> +  #loop through each unique ID
> +  for (PT_ID in uniqueID){
> +
> +  #for each iteration get the smallest DAYS_DIFF for that ID
> +  first[i] <- min(baseline[which(baseline$PT_ID==PT_ID),abs(baseline$DAYS_DIFF)])
> +
> +  #up the iteration counter
> +  i = i + 1
> +
> +  }
> +  #make a data frame with the lowest DAYS_DIFF and ID
> +  newdata <- data.frame(uniqueID,first)
> +  names(newdata) <- c("PT_ID","DAYS_DIFF")
> +
> +  #return the data frame containing the lowest GPI for each ID
> +  return(newdata)
> +  }
>> ldl.b<-baseline(ldl) #get all baseline ldl patient ID, total 11368 obs, all unique#
> Error in `[.data.frame`(baseline, which(baseline$PT_ID == PT_ID), abs(baseline$DAYS_DIFF)) :
>    undefined columns selected
>   
> Can anyone help me in figuring out how to get the minimum value of the absolute value of DAYS_DIFF for unique ID?
>
> Thanks a lot
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list