[R] looping over lapply calls

Sun Jun 4 08:40:13 CEST 2000

Dear Prof. Ripley,

Thank you very much for your detailed answer!!

> 3) In S engines the reasons are related to minimizing the number of pieces
> of memory in use as well as the size.  Using blocks allows the memory to be
> cleaned up at the end of each block.  R does not have delayed commitment to
> the same extent, and garbage collects memory only when full (or asked).  
> So at least one issue is how much is in use when memory management occurs,
> and both the size and number of R objects is relevant. As memory management
> is about to change, I think the only way out is to experiment, and that
> includes with the setting of the heap size in R.

Understood. 

> (If you had had some spaces around operators and indented consistently
> I might have been able to comment.  Set options(keep.source=FALSE) and
> read this into R and out again to get a consistent layout. You can
> then tidy it up in ESS.)
> 
> 

I hope it is not too late; I have sourced it (after fixing a couple of things),
then copied the printout into an xemacs
window, and added a few comments, and have changed the settings
of my email so that it does not break lines.

lmx.0 <- function (formula, data, max.num = 0, lapply.size = 100) 
{
    if (max.num) {
        max.num <- min(max(data$sim.counter), max.num)
        data <- data[data$sim.counter < max.num + 1, ]
    }
    else max.num <- max(data$sim.counter)
    names.vars <- drop.scope(formula)
    terms.in.model <- names(lm(formula = formula, data = data, 
        subset = data$sim.counter == 0)[[1]]) # there should be a simpler way

    loop.counter <- (max.num + 1)%/%lapply.size
    rest.of.data <- (max.num + 1)%%lapply.size
    tmp <- matrix(nrow = max.num + 1, ncol = length(names.vars) + 
        length(terms.in.model))
    i <- 0
    if (loop.counter) { # enter here if number to deal with is larger than lapply.size
        for (i in 1:loop.counter) {
            datai <- data[data$sim.counter <= ((i * lapply.size) - 
                1) & data$sim.counter >= ((i - 1) * lapply.size), ]
            # obtain output of interest ---fm[[1]],my.drop(fm)--- by applying that function
            # over the subset of data within loop; as result is list, unlist and turn into a matrix
            tmp[(((i - 1) * lapply.size) + 1):(i * lapply.size), 
                 ] <- matrix(unlist(lapply(split(datai, datai$sim.counter), 
                function(datos, formula) {
                  fm <- lm(formula = formula, data = datos)
                  c(fm[[1]], my.drop(fm))
                }, formula = formula)), nrow = lapply.size, byrow = T)
        }
    }
    if (rest.of.data) { #enter here if number to analyze < lapply.size or a non-interger multiple
        datai <- data[data$sim.counter >= (loop.counter * lapply.size), 
            ]
        tmp[(((i * lapply.size) + 1):(max.num + 1)), ] <- matrix(unlist(lapply(split(datai, 
            datai$sim.counter), function(datos, formula) {
            fm <- lm(formula = formula, data = datos)
            c(fm[[1]], my.drop(fm))
        }, formula = formula)), nrow = rest.of.data, byrow = T)
    }
    tmp
}

my.drop <- function (object) 
{
# This is from drop1 (by BDR) after eliminating things I didn't use;
# this function is severely crippled, so use only here.
# Components are NOT named.     
# Maybe I should only define it within lmx.0 to prevent accidental misuse?

    x <- model.matrix(object)
    iswt <- !is.null(wt <- object$weights)
    n <- nrow(x)
    asgn <- attr(x, "assign")
    tl <- attr(object$terms, "term.labels")
    scope <- drop.scope(object)
    ndrop <- match(scope, tl)
    ns <- length(scope)
    rdf <- object$df.resid
    chisq <- deviance.lm(object)
    dfs <- numeric(ns)
    RSS <- numeric(ns)
    y <- object$residuals + predict(object)
    rank <- object$rank
    for (i in 1:ns) {
        ii <- seq(along = asgn)[asgn == ndrop[i]]
        jj <- setdiff(seq(ncol(x)), ii)
        z <- if (iswt) 
            lm.wfit(x[, jj, drop = FALSE], y, wt)
        else lm.fit(x[, jj, drop = FALSE], y)
        dfs[i] <- z$rank
        RSS[i] <- deviance.lm(z)
    }
    dfs <- c(object$rank, dfs)
    RSS <- c(chisq, RSS)
    dfs <- (dfs[1] - dfs)[-1]
    rdf <- object$df.resid
    dev <- RSS[-1] - RSS[1]
    rms <- RSS[1]/rdf
    Fs <- (dev/dfs)/rms
    Fs
}

#an example

data.5010 <- data.frame(sim.counter=rep(seq(from=0,to=500),rep(10,501)),y=rnorm(5010),x=rnorm(5010))
ex1 <- lmx.0(y~x,data=data.5010)

-- 
Ramón Díaz-Uriarte
Dept. Zoology and Statistics
University of Wisconsin-Madison
Madison, WI 53706-1381

email: rdiazuri at students.wisc.edu
(NOTE: starting 15-July-2000 new email:
ramon-diaz at teleline.es)
phone: 608-238-8041
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._