[R] biglm: how it handles large data set?

noclue_ tim.liu at netzero.net
Sun Oct 31 08:22:12 CET 2010



I am trying to figure out why 'biglm' can handle large data set... 

According to the R document - "biglm creates a linear model object that uses
only p^2  memory for p variables. It can be updated with more data using
update. This allows linear regression on data sets larger than memory."

After reading the source code below, I still could not figure out how
'update'  implements the algorithm...

Thanks for any light shed upon this ... 

> biglm::biglm

function (formula, data, weights = NULL, sandwich = FALSE) 
{
    tt <- terms(formula)
    if (!is.null(weights)) {
        if (!inherits(weights, "formula")) 
            stop("`weights' must be a formula")
        w <- model.frame(weights, data)[[1]]
    }
    else w <- NULL
    mf <- model.frame(tt, data)
    mm <- model.matrix(tt, mf)
    qr <- bigqr.init(NCOL(mm))
    qr <- update(qr, mm, model.response(mf), w)
    rval <- list(call = sys.call(), qr = qr, assign = attr(mm, 
        "assign"), terms = tt, n = NROW(mm), names = colnames(mm), 
        weights = weights)
    if (sandwich) {
        p <- ncol(mm)
        n <- nrow(mm)
        xyqr <- bigqr.init(p * (p + 1))
        xx <- matrix(nrow = n, ncol = p * (p + 1))
        xx[, 1:p] <- mm * model.response(mf)
        for (i in 1:p) xx[, p * i + (1:p)] <- mm * mm[, i]
        xyqr <- update(xyqr, xx, rep(0, n), w * w)
        rval$sandwich <- list(xy = xyqr)
    }
    rval$df.resid <- rval$n - length(qr$D)
    class(rval) <- "biglm"
    rval
}
<environment: namespace:biglm>
---------------------------
-- 
View this message in context: http://r.789695.n4.nabble.com/biglm-how-it-handles-large-data-set-tp3020890p3020890.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list