[R] range of group variables

Thomas Lumley tlumley at uw.edu
Sat Mar 3 23:22:30 CET 2012


On Fri, Mar 2, 2012 at 11:29 AM, sajjad R <sajjad_r at hotmail.com> wrote:
>
> Dear All,
>
> I hope to run some simple survival analysis using the cox-proportional hazard models in R, my command will look like below:
>
> cox <- summary( coxph( Surv( mortality , TIME ) ~ Independent variables ) )
>
> My query is about specifying a range of independnt variables in R,
> such that each independent variable is included as the main defining variable independently of other variables in the variable list.
> I have around 10,000 independent variables or groups by which I hope to study differences in mortality rates over a period of time.
> All the 10,000 variables have one thing in common, i.e. their names start with the same alphabets rs followed by unique 6-8 digit numbers.

Ah yes. SNP data.

Ideally, you want to use coxph.fit() rather than coxph().  This is
significantly faster and takes a model matrix rather than a formula,
so you can write a loop with index, say, i and construct the model
matrix as
   X<-cbind(adjustmentvariables, snp[ , i])

Also, it will help to provide starting values for the coefficients of
the adjustment variables.  And, if you initially specify just one
iteration of the model you can filter out nearly all the SNPs and then
go back and refit the model properly for the few that might be
important.

If you need to use coxph() and the formula interface, the simplest
approach is probably to paste together the formula as a character
string and then use as.formula() to convert it to a formula.


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list