[R] Cox model -missing data.
info at aghmed.fsnet.co.uk
Fri Dec 19 12:37:39 CET 2014
On 19/12/2014 11:17, aoife doherty wrote:
> Many thanks, I appreciate the response.
> When I convert the missing values to NA and run the cox model as described
> in previous post, the cox model seems to remove all of the rows with a
> missing value (as the number of rows "n" in the cox output after I
> completely remove any row with missing data is the same as the number of
> rows "n" in the cox output after I change the missing values to NA).
> What I had been hoping to do is not completely remove a row with missing
> data for a co-variable, but rather somehow censor or estimate a value for
> the missing value?
I think you are searching for some form of imputation here. A full
answer would be way beyond the scope of this list as it depends on so
many things including the mechanism driving the missingness.
Have a look at
and see whether that helps.
> In reality, I have ~600 people with survival data and say 6 variables
> attached to them. After I incorporate a 7th variable (for which the
> information isn't available for every individual), I have 400 people left.
> Since I still have survival data and almost all of the information for the
> other 200 people (the only thing missing is information about that 7th
> variable), it seems a waste to remove all of the survival data for 200
> people over one co-variate. So I was hoping instead of completely removing
> the rows, to just somehow acknowledge that the data for this particular
> co-variate is missing in the model but not completely remove the row? This
> is more what I was hoping someone would know if it's possible to
> incorporate into the model I described above?
> On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding <Ted.Harding at wlandres.net>
>> Hi Aoife,
>> I think that if you simply replace each "*" in the data file
>> with "NA", then it should work ("NA" is usually interpreted
>> as "missing" for those functions for which missingness is
>> relevant). How you subsequently deal with records which have
>> missing values is another question (or many questions ... ).
>> So your data should look like:
>> V1 V2 V3 Survival Event
>> ann 13 WTHomo 4 1
>> ben 20 NA 5 1
>> tom 40 Variant 6 1
>> Hoping this helps,
>> On 19-Dec-2014 10:12:00 aoife doherty wrote:
>>> Hi all,
>>> I have a data set like this:
>>> Test.cox file:
>>> V1 V2 V3 Survival Event
>>> ann 13 WTHomo 4 1
>>> ben 20 * 5 1
>>> tom 40 Variant 6 1
>>> where "*" indicates that I don't know what the value is for V3 for Ben.
>>> I've set up a Cox model to run like this:
>>> death.dat <- read.table("Test.cox",header=T)
>>> deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid))
>>> Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) +
>>> strata(factor(V2)) + factor(V3)) +
>>> As you can see from the Test.cox file, I have a missing value "*". How
>>> where do I tell the R script "treat * as a missing variable". If I can't
>>> incorporate missing values into the model, I assume the alternative is to
>>> remove all of the rows with missing data, which will greatly reduce my
>>> set, as most rows have at least one missing variable.
>>> [[alternative HTML version deleted]]
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
>> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
>> Date: 19-Dec-2014 Time: 10:21:23
>> This message was sent by XFMail
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.5577 / Virus Database: 4253/8764 - Release Date: 12/19/14
More information about the R-help