[R] Re : Large database help

Thomas Lumley tlumley at u.washington.edu
Tue May 16 23:40:06 CEST 2006


On Tue, 16 May 2006, roger koenker wrote:

> In ancient times, 1999 or so, Alvaro Novo and I experimented with an
> interface to mysql that brought chunks of data into R and accumulated
> results.
> This is still described and available on the web in its original form at
>
> 	http://www.econ.uiuc.edu/~roger/research/rq/LM.html
>
> Despite claims of "future developments" nothing emerged, so anyone
> considering further explorations with it may need training in
> Rchaeology.

A few hours ago I submitted to CRAN a package "biglm" that does large 
linear regression models using a similar strategy (it uses incremental QR 
decomposition rather than accumalating the crossproduct matrix). It also 
computes the Huber/White sandwich variance estimate in the same single 
pass over the data.

Assuming I haven't messed up the package checking it will appear 
in the next couple of day on CRAN. The syntax looks like
   a <- biglm(log(Volume) ~ log(Girth) + log(Height), chunk1)
   a <- update(a, chunk2)
   a <- update(a, chunk3)
   summary(a)

where chunk1, chunk2, chunk3 are chunks of the data.


 	-thomas




More information about the R-help mailing list