[R] bias sampling

Thomas Lumley tlumley at uw.edu
Sun Mar 18 22:51:52 CET 2012

On Mon, Mar 19, 2012 at 10:27 AM, David Winsemius
<dwinsemius at comcast.net> wrote:
> On Mar 18, 2012, at 3:54 PM, Thomas Lumley wrote:
>> On Mon, Mar 19, 2012 at 6:34 AM, David Winsemius <dwinsemius at comcast.net>
>> wrote:
>>> On Mar 16, 2012, at 1:09 PM, niloo javan wrote:
>>>> hi
>>>> i want to analyze Right Censore-Length bias data under cox model with
>>>> covariate.
>>>> what is the package ?
>>> I initially left this question alone because I thought there might be
>>> viewers for whom it all made perfect sense. After two days that
>>> probability
>>> seems to be declining. The problem I had was the meaning of "length bias
>>> data". Are you talking about a non-proportional effect in which the
>>> assumption of a constant hazard ratio over time is false and other
>>> methods
>>> are needed. If that is correct, then you should get a copy of Therneau
>>> and
>>> Grambsch's "Modeling Survival Data" and study the chapter on "Functional
>>> Form'. The package would be "survival".
>> Length-biased sampling is what you get when you take a cross-sectional
>> sample of an ongoing process -- long intervals are over-represented.
> Thank you Thomas;
>  For example people who have survived to age 75 might be systematically
> different with respect to both the distribution of cardiovascular risk
> factors and their impact on the event of interest (AMI. CV death, or
> all-cause mortality) than persons at age 45. And that would also not take
> into account the fact those risk factors might have changed over the
> interval from age 45 to age 75 in the survivors?
>> If the arrival time is known for everyone in the sample, the usual Cox
>> model facilities for left truncation apply.  If the arrival times are
>> not known it would be much more difficult, and would probably need
>> parametric modelling.
>  Am I correct in thinking that additional assumptions about the
> "length-bias" would need to be explicitly stated or modeled under a set of
> plausible scenarios before progress in any framework could be anticipated?
> It would seem that there could be many forms of such a "length-bias".

Yes, as with any missing data problem things can go arbitrarily badly wrong.

The classical 'length-biased sampling' problem is a cross-sectional
sample from a stationary population process, and that gives good

Obviously if you don't recruit anyone before time  T, there is no
information about what happened before then, but there may still be
useful information afterwards.  A good example is the research project
on after-effects of the nuclear bombings of Nagasaki and Hiroshima,
where recruitment started (IIRC) 5 years after the event.  There's no
information on survival in the first five years, but very good
subsequent information.


Thomas Lumley
Professor of Biostatistics
University of Auckland

More information about the R-help mailing list