[R] A model for disease progression

Thu Jul 31 15:46:07 CEST 2003

Thank you for the continued comments on this problem. I think I have a
solution, which I thought I'd share here, in the hope that any obvious
errors or inefficiencies can be pointed out.

As I described before, I have a snapshot of a population taken at a
certain time. I am interested in an age-related disease, which progresses
healthy->A->B. (There is no recovery.) For each individual, I know their
age (in years) and the stage of the disease.

Suppose I'm interested in the transition healthy->A. For each individual I
have a censored observation of the "lifetime" random variable:
   if the individual is age t and is diseased, lifetime is in (0,t].
   if the individual is age t and is healthy, lifetime is in (t,inf)

The Surv function in R does not deal with this sort of censored data. 
Happily, Mai Zhou has written a package called dblcens, available on CRAN,
which will estimate the survival curve. 

So I can work out the lifetime T0A for the transition healthy->A, and the
lifetime T0B for the transition healthy->B. I would like to know about the
time for the transition A->B, where
  T0B = T0A + TAB.
This is a deconvolution problem. It cannot be solved in general, because
knowing T0A and T0B is insufficient to determine the joint distribution of
(T0A,TAB). If I assume that T0A and TAB are independent, it can be solved.

I used an ad-hoc solution: find the estimated distribution TABe which
minimizes the mean-square-error distance between the survival function for
T0B and that for T0A+TABe (i.e. the convolution of T0A+TABe, for which R
has a handy function convolve). I did this by optimizing over positive
measures for TABe with optim, and tweaking a Lagrange multiplier to make
the the optimum be a distribution. 

I can then plot survival curves for T0A, T0B and T0A+TABe. This lets me
visualize whether the assumption of independence is good.

Damon.