[R] Within ID variable delete all rows after reaching a specific value

Sat Apr 26 09:32:24 CEST 2014

Hi,

You may also try:
set.seed(425)

##your code
tmp <- data.frame(....

#####

tmp1 <- tmp
str(tmp1)
#'data.frame':    1000 obs. of  3 variables:
# $ X1: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
# $ X2: Factor w/ 127 levels "1","10","100",..: 1 1 1 1 1 1 1 1 2 2 ...
# $ X3: Factor w/ 56 levels "01.01.1990","01.01.1991",..: 1 21 17 37 33 51 48 10 11 45 #...

 tmp1 <- tmp1[with(tmp1,order(X2, as.Date(X3, "%d.%m.%Y"))),]
tmp2 <- tmp1[with(tmp1,!ave(as.numeric(as.character(X1)),X2, FUN=function(x)  cumsum(cumsum(x)) >1 )),]

###checking results with Jim's method
tmp2New <- tmp2
tmp2New$X3 <- as.Date(tmp2New$X3, "%d.%m.%Y")
identical(tmp2New,newtmp) ##Jim's result
#[1] TRUE

A.K.

On Saturday, April 26, 2014 12:07 AM, Jim Lemon <jim at bitwrit.com.au> wrote:
On 04/26/2014 12:42 PM, Jennifer Sabatier wrote:
> So, I know that's a confusing Subject header.
>
> Here's similar data:
>
>
> tmp<- data.frame(matrix(
>                          c(rbinom(1000, 1, .03),
>                            array(1:127, c(1000,1)),
>                            array(format(seq(ISOdate(1990,1,1), by='month',
> length=56), format='%d.%m.%Y'), c(1000,1))),
>                          ncol=3))
> tmp<- tmp[with(tmp, order(X2, X3)), ]
> table(tmp$X1)
>
>
> X1 is the variable of interest - disease status.  It's a survival-type of
> variable, where you are 0 until you become 1.
> X2 is the person ID variable.
> X3 is the clinic date (here it's monthly, just for example...but in my real
> data it's a bit more complicated - definitely not equally spaced nor the
> same number of visits to the clinic per ID.).
>
> Some people stay X1 = 0 for all clinic visits.  Only a small proportion
> become X1=1.
>
> However, the data has errors I need to clean off.  Once someone becomes
> X1=1 they should have no more rows in the dataset.  These are data entry
> errors.
>
> In my data I have people who continue to have rows in the data.  Sometimes
> the rows show X1=0 and sometimes X1=1.  Sometimes there's just one more row
> and sometimes there are many more rows.
>
> How can I go through, find the first X1 = 1, and then delete any rows after
> that, for each value of X2?
>
> Thanks!
>
> Jen
>
Hi Jen,
This might do what you want:

tmp$X3<-as.Date(tmp$X3,"%d.%m.%Y")
tmp<-tmp[order(tmp$X2,tmp$X3),]
first<-TRUE
for(patno in unique(tmp$X2)) {
  cat(patno,"\n")
  tmpbit<-tmp[tmp$X2 == patno,]
  firstone<-which(tmpbit$X1 == 1)[1]
  cat(firstone,"\n")
  if(is.na(firstone)) firstone<-dim(tmpbit)[1]
  newtmpbit<-tmpbit[1:firstone,]
  if(first) {
   newtmp<-newtmpbit
   first<-FALSE
  }
  else newtmp<-rbind(newtmp,newtmpbit)
}

Jim

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.