[R] lean and mean lm/glm?

Thomas Lumley tlumley at u.washington.edu
Tue Aug 22 16:22:18 CEST 2006


On Mon, 21 Aug 2006, Damien Moore wrote:

>
>> For very large regression problems there is the biglm package (put you
>> data into a database, read in 500,000 rows at a time, and keep updating
>> the fit).
>
> thanks. I took a look at biglm and it seems pretty easy to use and, 
> looking at the source, avoids much of the redundancy of lm. Correct me 
> if i'm wrong, but I think it would be virtually impossible to extend to 
> glm, because of the non-linearity in glm models.

No, it is quite straightforward if you are willing to make multiple passes 
through the data.  It is hard with a single pass and may not be possible 
unless the data are in random order.

Fisher scoring for glms is just an iterative weighted least squares 
calculation using a set of 'working' weights and 'working' response. These 
can be defined chunk by chunk and fed to biglm.  Three iterations should 
be sufficient.

 	-thomas



More information about the R-help mailing list