[R] Cox Proportional Hazard with missing covariate data

Arthur Allignol arthur.allignol at fdm.uni-freiburg.de
Tue May 5 17:32:39 CEST 2009

(1) Makes sense. Another approach is to use
the time since study entry and include the age
of the part in the model. A related discussion
here: http://tolstoy.newcastle.edu.au/R/e2/help/07/02/9831.html

(2) It is left-truncation. A part is observed only if it has
survived until study entry. Of course, if you reset the clock
at study entry, there's no delayed entries anymore.

Philipp Rappold wrote:
> Hi,
> Arthur, thanks a lot for your super-fast reply!
> In fact I am using the time when the part has been used for the first time, so your example should work in my case.
> Moreover, as I have time-variant covariates, the example should look like this in my specific case:
> start	stop	status	temp	humid
> 5	6	0	32	43
> 6	7	1	34	42
> Just two more things:
> (1) I am quite a newbie to cox-regression, so I wonder what you think about the approach that I mentioned above? Don't worry, I won't nail you down to this, just want to make sure I am not totally "off track"!
> (2) I don't think that you'd call this "left-truncated" observations, because I DO know the time when the part was used for the first time, I just don't have covariate values for its whole time of life, e.g. just the last two years in the example above. Left truncation in my eyes would mean that I did not even observe a specific part, e.g. because it has died before the study started.
> Again, thanks a lot, I'll be happy to provide valuable help on this list as soon as my R-skills are advancing.
> All the best
> Philipp
> Arthur Allignol wrote:
>> Hi,
>> In fact, you have left-truncated observations.
>> What timescale do you use, time 0 is the
>> study entry, or when the wear-part has been used for the
>> first time?
>> If it is the latter, you can specify the "age" of the wear part
>> at study entry in Surv(). For example, if a wear part has been
>> used for 5 years before study entry, and "dies" 2 years after,
>> the data will look like that:
>> start stop status
>>     5    7      1
>> Hope this helps,
>> Arthur Allignol
>> Philipp Rappold wrote:
>>> Dear friends,
>>> I have used R for some time now and have a tricky question about the
>>> coxph-function: To sum it up, I am not sure whether I can use coxph in
>>> conjunction with missing covariate data in a model with time-variant
>>> covariates. The point is: I know how "old" every piece that I
>>> oberserve is, but do not have fully historical information about the
>>> corresponding covariates. Maybe you have some advice for me, although
>>> this problem might only be 70% R and 30% statistically-related. Here's
>>> a detailled explanation:
>>> I want to analyze the effect of environmental effects (i.e.
>>> temperature and humidity) on the lifetime of some wear-parts. The
>>> study should be conducted on a yearly basis, meaning that I have
>>> collected empirical data on every wearpart at the end of every year.
>>> DATA:
>>> I have collected the following data:
>>> - Status of the wear-part: Equals "0" if part is still alive, equals
>>> "1" if part has "died" (my event variable)
>>> - Environmental data: Temperature and humidity have been measured at
>>> each of the wear-parts on a yearly basis (because each wear-part is at
>>> a different location, I have different data for each wear-part)
>>> I started collecting data between 2001 and 2007. In 2001, a vast
>>> amount of of wearparts has already been in use. I DO KNOW for every
>>> part how long it has been used (even if it was employed before 2001),
>>> but I DO NOT have any information about environmental conditions like
>>> temperature or humidity before 2001 (I call this semi-left-censored).
>>> Of course, one could argue that I should simply exclude these parts
>>> from my analysis, but I don't want to loose valuable information, also
>>> because the amount of "new parts" that have been employed between 2001
>>> and 2007 is rather small.
>>> Additionally, I cannot make any assumption about the underlying
>>> lifetime distribution. Therefore I have to use a non-parametrical
>>> model for estimation (most likely cox).
>>>> From an econometric perspective, is it possible to use Cox
>>> Proportional Hazard model in this setting? As mentioned before, I have
>>> time-variant covariates for each wearpart, as well as what I call
>>> "semi-left-censored" data that I want to use. If not, what kind of
>>> analysis would you suggest?
>>> Thanks a lot for your great help, I really appreciate it.
>>> All the best
>>> Philipp
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list