[BioC] Cox Model
Kasper Daniel Hansen
khansen at stat.Berkeley.EDU
Wed Feb 13 20:08:01 CET 2008
On Feb 13, 2008, at 2:23 AM, Ramon Diaz-Uriarte wrote:
> Dear Eleni,
> You are trying to fit a model with 18000 covariates but only 80
> samples (of
> which, at most, only 80 are not censored). Just doing it the way
> you are
> trying to do it is unlikely to work or make much sense...
> You might want to take a look at the work of Torsten Hothorn and
> colleagues on
> survival ensembles, with implementations in the R package mboost,
> and their
> work on random forests for survival data (see R package party).
> Some of this
> funcionality is also accessible through our web-based tool SignS
> (http://signs.bioinfo.cnio.es), which uses the above packages.
> Depending on your exact question, you might also want to look at
> the approach
> of Jelle Goeman, for testing whether sets of genes (e.g., you
> complete 18000
> set of genes) are related to the outcome of interest (survival in
> your case).
> Goeman's approach is available in the globaltest package from BioC.
Actually you should look at Jelle's penalized package which fits an
L1-regularized version of the cox model (which is something
completely different from the globaltest approach). Using
regularization in some way is probably your only hope if you want to
fit a joint model instead of 18000 marginal models. I know that Jelle
has an example with 1000s of genes from a microarray experiment - I
don't know whether the code scales to 18000.
What you are trying to do is certainly pretty ambitious and you
should spend some time understanding the issues if you want to
successfully tackle your problem. Or you could just do 18000 marginal
regressions which should be easy.
> Hope this helps,
> On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote:
>> Hello BioC-community,
>> It's been a week now that I am struggling with the implementation
>> of a cox
>> model in R. I have 80 cancer patients, so 80 time measurements and 80
>> relapse or no measurements (respective to censor, 1 if relapsed
>> over the
>> examined period, 0 if not). My microarray data contain around
>> 18000 genes.
>> So I have the expressions of 18000 genes in each of the 80 tumors
>> 80*18000). I would like to build a cox model in order to retrieve
>> the most
>> significant genes (according to the p-value). The command that I
>> am using
>> test1 <- list(time,relapse,genes)
>> coxph( Surv(time, relapse) ~ genes, test1)
>> where time is a vector of size 80 containing the times, relapse is
>> a vector
>> of size 80 containing the relapse values and genes is a matrix
>> When I give the coxph command I retrieve an error saying that cannot
>> allocate vector of size 2.7Mb (in Windows). I also tried linux
>> and then I
>> receive error that maximum memory is reached. I increase the
>> memory by
>> initializing R with the command:
>> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M
>> I think it cannot get better than that because if I try for example
>> max-vsize=300 the memomry capacity is stored as NA.
>> Does anyone have any idea why this happens and how I can overcome it?
>> I would be really grateful if you could help!
>> It has been bothering me a lot!
>> Thank you all,
>> [[alternative HTML version deleted]]
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> Ramón Díaz-Uriarte
> Statistical Computing Team
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
> PGP KeyID: 0xE89B3462
> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y ...
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/
More information about the Bioconductor