[R] Cox model

darteta001 at ikasle.ehu.es darteta001 at ikasle.ehu.es
Tue Feb 12 16:52:41 CET 2008


Dear Eleni,

from a previous post regarding maximum number of variables in a 
multiple linear regression analysis, posted last tuesday, and I think 
it can be relevant also to Cox PH models:

"I can think of
no circumstance where multiple regression on "hundreds of thousands of
variables" is anything more than a fancy random number generator"


The thread is continued by someone having your same problem:


"When I try a regression problem with 
3,000 coefficients in R running under Windows XP 64 bit with 8Gb of 
memory 
on the machine and the /3Gb option active (i.e., R can get up to 3Gb), 
R 
2.6.1 runs out of memory (apparently trying to duplicate the model 
matrix)"


but the author continues...

"...one must be careful doing ordinary linear 
regression with large numbers of coefficients.  It does seem a little 
unlikely that there is sufficient data to get useful estimates of 
three 
thousand coefficients using linear regression"

I also work with genomic data and it seems a well-accepted rule to 
filter data. I am sure not all of your 18000 genes are relevant to 
your study or have an effect on survival. Have a look at BioConductor 
mailing list for info on this topic.

Best
David

> Hi David,
> 
> The problem is that I need all these regressors. I need a 
coefficient for
> every one of them and then rank them according to that coefficient.
> 
> Thanks,
> Eleni
> 
> On Feb 12, 2008 4:54 PM, <darteta001 at ikasle.ehu.es> wrote:
> 
> > Hi Eleni,
> >
> > I am not an expert in R or statistics but in my opinion you have 
too
> > many regressors compared to the number of observations and that 
might
> > be the reason why you get the error. Others might say better but as
> > far as I know, having only 80 observations, it is a good idea to 
first
> > filter your list of variables down to a few tenths.
> >
> >
> > HTH
> >
> > David
> >
> > > Hello R-community,
> > >
> > > It's been a week now that I am struggling with the 
implementation of
> > a cox
> > > model in R. I have 80 cancer patients, so 80 time measurements 
and 80
> > > relapse or no measurements (respective to censor, 1 if relapsed 
over
> > the
> > > examined period, 0 if not). My microarray data contain around 
18000
> > genes.
> > > So I have the expressions of 18000 genes in each of the 80 tumors
> > (matrix
> > > 80*18000). I would like to build a cox model in order to retrieve
> > the most
> > > significant genes (according to the p-value). The command that I 
am
> > using
> > > is:
> > >
> > > test1 <- list(time,relapse,genes)
> > > coxph( Surv(time, relapse) ~ genes, test1)
> > >
> > > where time is a vector of size 80 containing the times, relapse 
is a
> > vector
> > > of size 80 containing the relapse values and genes is a matrix
> > 80*18000.
> > > When I give the coxph command I retrieve an error saying that 
cannot
> > > allocate vector of size 2.7Mb  (in Windows). I also tried linux 
and
> > then I
> > > receive error that maximum memory is reached. I increase the 
memory
> > by
> > > initializing R with the command:
> > > R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-
nsize=200M
> > >
> > > I think it cannot get better than that because if I try for 
example
> > > max-vsize=300 the memomry capacity is stored as NA.
> > >
> > > Does anyone have any idea why this happens and how I can 
overcome it?
> > >
> > > I would be really grateful if you could help!
> > > It has been bothering me a lot!
> > >
> > > Thank you all,
> > > Eleni
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-
project.org/posting-
> > guide.html
> > > and provide commented, minimal, self-contained, reproducible 
code.
> > >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 



More information about the R-help mailing list