[R] building a formula for glm() with 30,000 independent variables

Murray Jorgensen maj at stats.waikato.ac.nz
Mon Nov 11 01:27:45 CET 2002


You have not really given enough background to enable much help to be 
given. I have several ideas, but how sensible they might be depends on 
your situation. Several people have asked how many cases you have, and 
so do I.

It seems to me unlikely that you have 30000 predictors that are all 
qualitatively different. I imagine that you must have a family or 
families of related predictors that are themselves described by 
numerical parameters. Knowledge of any structure on the predictors may 
suggest strategies for choosing representative predictors.

A crude measure of strength of a predictor X might be obtained by 
calculating a two-sample t statistic for the X values of the 0 and 1 
responses. Once a small set of strong predictors has been found you 
might like to fit the model based on them and calculate the deviance 
residuals.

Then you could treat these residuals as responses and consider variable 
selection in *linear* models predicing them.

Murray Jogensen

Ben Liblit wrote:
> I would like to use R to perform a logistic regression with about
> 30,000 independent variables.  That's right, thirty thousand.  Most
> will be irrelevant: the intent is to use the regression to identify
> the few that actually matter.
> 
> Among other things, this calls for giving glm() a colossal "y ~ ..."
> formula with thirty thousand summed terms on its right hand side.  I
> build up the formula as a string and then call as.formula() to convert
> it.  Unfortunately, the conversion fails.  The parser reports that it
> has overflowed its stack.  :-(
> 
> Is there any way to pull this off in R?  Can anyone suggest
> alternatives to glm() or to R itself that might be capable of handling
> a problem of this size?  Or am I insane to even be considering an
> analysis like this?
> 
> Thanks!
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- 
> 
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ 
> 
> 

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home     Mobile 021 395 862

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list