[BioC] Re : Cox Model

Fri Feb 15 18:33:21 CET 2008

That's a good idea as you can address the sample selection bias. You 
might also be interested in reading the following papers if you haven't 
done so already (there are other on a similar topic):

Michiels S, Koscielny S, Hill C.
Prediction of cancer outcome with microarrays: a multiple random 
validation strategy.
Lancet. 2005 Feb 5-11;365(9458):488-92.
PMID: 15705458

Ein-Dor L, Kela I, Getz G, Givol D, Domany E.
Outcome signature genes in breast cancer: is there a unique set?
Bioinformatics. 2005 Jan 15;21(2):171-8. Epub 2004 Aug 12.
PMID: 15308542

Regards, Adai

Eleni Christodoulou wrote:
> I was actually thinking of creating bootstrap samples and applying
> univariate cox models in each of them. Then
> I would sdelect the significant genes for each bootstrapping. I would
> declare the common genes among the bootstrap samples as actually
> significant...
> 
> On Thu, Feb 14, 2008 at 5:46 AM, Adaikalavan Ramasamy <
> ramasamy at cancer.org.uk> wrote:
> 
>> Eleni,
>>
>> Note that some of the genes that declared as significant in a univariate
>> analysis could be highly correlated. Thus, some of the selected genes
>> would not be informative in building the multivariate model.
>>
>> You might want to consider reducing the dimensionality by first grouping
>> the genes into clusters with similar patterns. There are many techniques
>> but the one I can recall now is one of the earliest called gene shaving.
>>
>> Or you can pre-select some genes based on variability measures etc.
>>
>> Regards, Adai
>>
>>
>>
>> Eleni Christodoulou wrote:
>>> Hi,
>>>
>>> Thanks for the replies. I will probably try to perform survival analysis
>> on
>>> each of the genes to get gene-wise p-values and then select the most
>>> significant (the ones that are below a certain p-value) and proceed to a
>>> full cox regression using the significant genes. Do you think that this
>>> makes sense?
>>>
>>> Thanks a lot,
>>> Eleni
>>>
>>> On Feb 13, 2008 2:11 PM, <phguardiol at aol.com> wrote:
>>>
>>>>  Hi,
>>>> wouldnt it make sense to first have data reduction dimensionality
>> before
>>>> undergoing such survival analysis ? Certainly, some of your genes have
>>>> similar expression profiles across samples...?
>>>>  Best,
>>>>  Philippe Guardiola
>>>>
>>>>
>>>>  -----E-mail d'origine-----
>>>> De : Ramon Diaz-Uriarte <rdiaz at cnio.es>
>>>> A : bioconductor at stat.math.ethz.ch
>>>> Cc : Eleni Christodoulou <elenichri at gmail.com>
>>>> Envoyé le : Me, 13 Février 2008 11:23
>>>> Sujet : Re: [BioC] Cox Model
>>>>
>>>>  Dear Eleni,
>>>>
>>>>
>>>> You are trying to fit a model with 18000 covariates but only 80 samples
>> (of
>>>> which, at most, only 80 are not censored). Just doing it the way you
>> are
>>>> trying to do it is unlikely to work or make much sense...
>>>>
>>>>
>>>> You might want to take a look at the work of Torsten Hothorn and
>> colleagues on
>>>> survival ensembles, with implementations in the R package mboost, and
>> their
>>>> work on random forests for survival data (see R package party). Some of
>> this
>>>> funcionality is also accessible through our web-based tool SignS
>>>>
>>>> (http://signs.bioinfo.cnio.es), which uses the above packages.
>>>>
>>>>
>>>> Depending on your exact question, you might also want to look at the
>> approach
>>>> of Jelle Goeman, for testing whether sets of genes (e.g., you complete
>> 18000
>>>> set of genes) are related to the outcome of interest (survival in your
>> case).
>>>> Goeman's approach is available in the globaltest package from BioC.
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>>> On Wednesday 13 February 2008 08:10, Eleni Christodoulou wrote:
>>>>
>>>>> Hello BioC-community,
>>>>> It's been a week now that I am struggling with the implementation of a
>> cox
>>>>> model in R. I have 80 cancer patients, so 80 time measurements and 80
>>>>> relapse or no measurements (respective to censor, 1 if relapsed over
>> the
>>>>> examined period, 0 if not). My microarray data contain around 18000
>> genes.
>>>>> So I have the expressions of 18000 genes in each of the 80 tumors
>> (matrix
>>>>> 80*18000). I would like to build a cox model in order to retrieve the
>> most
>>>>> significant genes (according to the p-value). The command that I am
>> using
>>>>> is:
>>>>> test1 <- list(time,relapse,genes)
>>>>> coxph( Surv(time, relapse) ~ genes, test1)
>>>>> where time is a vector of size 80 containing the times, relapse is a
>> vector
>>>>> of size 80 containing the relapse values and genes is a matrix
>> 80*18000.
>>>>> When I give the coxph command I retrieve an error saying that cannot
>>>>> allocate vector of size 2.7Mb  (in Windows). I also tried linux and
>> then I
>>>>> receive error that maximum memory is reached. I increase the memory by
>>>>> initializing R with the command:
>>>>> R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M
>>>>> I think it cannot get better than that because if I try for example
>>>>> max-vsize=300 the memomry capacity is stored as NA.
>>>>> Does anyone have any idea why this happens and how I can overcome it?
>>>>> I would be really grateful if you could help!
>>>>> It has been bothering me a lot!
>>>>> Thank you all,
>>>>> Eleni
>>>>>   [[alternative HTML version deleted]]
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> --
>>>>
>>>> Ramón Díaz-Uriarte
>>>>
>>>> Statistical Computing Team
>>>>
>>>> Centro Nacional de Investigaciones Oncológicas (CNIO)
>>>>
>>>> (Spanish National Cancer Center)
>>>>
>>>> Melchor Fernández Almagro, 3
>>>>
>>>> 28029 Madrid (Spain)
>>>>
>>>> Fax: +-34-91-224-6972
>>>>
>>>> Phone: +-34-91-224-6900
>>>>
>>>> http://ligarto.org/rdiaz
>>>>
>>>> PGP KeyID: 0xE89B3462
>>>>
>>>> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>>>>
>>>>
>>>>
>>>>
>>>> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y
>> ...{{dropped:3}}
>>>>
>>>> _______________________________________________
>>>>
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>
>>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>       [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>