[R] Survival, should I use (start,stop) and how?

Asa Johannesen bsaj at leeds.ac.uk
Fri Apr 20 15:07:03 CEST 2012


Dear R users,

I fear this is terribly trivial but I'm struggling to get my head around it.

First of all, I'm using the "survival" package in R 2.12.2 on Windows Vista with the RExcel plugin. You probably only need to know that I'm using "survival" for this.

I have data collected from 180 or so individuals that were checked 7 times throughout a trial with set start and end times. Once the event happens (death by predator) there are no more checks for that individual. This means that I check on each individual up to 7 times with either an event recorded or the final time being censored.

At the moment, I have a data sheet with one observation per individual; that is either the event time (the observation time when the individual had had an event) or the censored time. However, I'd like to add a time dependent factor and I also wonder if this data should be treated as interval censored.

The time dependent factor is like this. The individuals are grouped in "houses" and once one individual in a group has an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered the others in the "house" as well (the predator is able to consume many individuals). At the moment I'm coding this as a normal two level factor (discovered) where all individuals alive after the first event in that house are "TRUE" and the first individuals in a house to be eaten are "FALSE". All individuals in houses that were not discovered at al are also "FALSE"l. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).

Should I write up this data set with a start and stop time for every check I made so each individual has up to 7 records, one for each time I checked?

Is there a quick and easy way to do this in R or would I have to go through the data set manually?

Does coding the "discovered" factor the way I have, make statistical sense? 

Should I worry about proportional hazards of the "discovered" factor? It seems to me that it would often turn out not proportional because of its nature.

Sorry, lots of stats questions. I don't mind if you don't answer all of these. Just knowing how to best feed this data into R would help me no end. The rest I can probably glean from the millions of survival analysis books I have lying about.

Cheers,

Freya

PS: Example data as it is: Treatment has 3 levels and House 6, though I don't normally include House in the analysis as it's not so much the house as whether the individuals were previously discovered that is interesting. I may include it as a random factor or stratify by it, but I want to get the basics sorted before I tackle that.


ID  Time  Event  Discovered   Treatment  House
1     10      1           FALSE          1                1
2     20      1           TRUE           1                1
3     90      0           TRUE           1                1
4     10      1           FALSE          2                5
5     10      1           FALSE          2                5
6     40      1           TRUE           2                5

Should ID 2 have two rows, one with no event at time 10? Should it be coded with start and end times as (first row) 0, 10, 0 (second row) 10, 20, 1?


More information about the R-help mailing list