[R] Antwort: Re: Antwort: Buying more computer for GLM

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Sep 1 11:12:48 CEST 2006


On Fri, 1 Sep 2006, g.russell at eos-finance.com wrote:

> Peter Dalgaard wrote
> > Is this floating point bound? (When you say 30 factors does that mean
> > 30 parameters or factors representing a much larger number of groups).
> > If it is integer bound, I don't think you can do much better than
> > increase CPU speed and - note - memory bandwidth (look for large-cache
> > systems and fast front-side bus). To increase floating point
> > performance, you might consider the option of using optimized BLAS
> > (see the Windows FAQ 8.2 and/or the "R Installation and
> > Administration" manual) like ATLAS; this in turn may be multithreaded
> > and make use of multiple CPUs or multi-core CPUs.
> 
> By "factors" I mean "parameters".   I apologise for the confusion.
> 
> This is floating point bound, so ATLAS might be a good idea. 
> 
> Before I put a lot of work into investigating multiple processors, I 
> need to know, is the bottleneck with GLM going to be BLAS?

Probably not, but you have the ability to profile in R and find out.


Some more comments;

1) The Fortran code that underlies glm is that of lm.fit that only makes 
   use of level-1 BLAS and so is not going to be helped greatly by an 
   optimized BLAS.

2) No one has as far as I know succeeded in making a multithreaded 
   Rblas.dll for Windows.  And under systems using pthreads, the success 
   with multithreaded BLAS is very mixed, with it resulting in a dramatic 
   slowdown in some problems.

3) As I recall, you were doing model selection via AIC on 20,000 
   observations.  You might want to think hard about that, since AIC is 
   designed for good prediction.  I would do model exploration on a much 
   smaller representative subset, and if I had 20,000 observations and 30 
   parameters and was interested in prediction, not do subset selection at 
   all.

4) glm() alllows you to specify starting parameters, which you could find 
   from a subsample.  Very likely only 1 or 2 iterations would be needed.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list