[R] Antwort: Buying more computer for GLM

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Aug 31 13:44:28 CEST 2006

g.russell at eos-finance.com writes:

> Hello,
> at the moment I am doing quite a lot of regression, especially 
> logistic regression, on 20000 or more records with 30 or more 
> factors, using the "step" function to search for the model with the 
> smallest AIC.   This takes a lot of time on this 1.8 GHZ Pentium 
> box.   Memory does not seem to be such a big problem; not much 
> swapping is going on and CPU usage is at or close to 100%.    What 
> would be the most cost-effective way to speed this up?    The 
> obvious way would be to get a machine with a faster processor (3GHz 
> plus) but I wonder whether it might instead be better to run a dual-
> processor machine or something like that; this looks at least like a
> problem R should be able to parallelise, though I don't know whether it 
> does.

Is this floating point bound? (When you say 30 factors does that mean
30 parameters or factors representing a much larger number of groups).
If it is integer bound, I don't think you can do much better than
increase CPU speed and - note - memory bandwidth (look for large-cache
systems and fast front-side bus). To increase floating point
performance, you might consider the option of using optimized BLAS
(see the Windows FAQ 8.2 and/or the "R Installation and
Administration" manual) like ATLAS; this in turn may be multithreaded
and make use of multiple CPUs or multi-core CPUs.

   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

More information about the R-help mailing list