[Rd] often unnecessary duplicate in sapply / as.vector

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jul 11 14:20:17 CEST 2006


On Fri, 7 Jul 2006, Thomas Lumley wrote:

> On Fri, 7 Jul 2006, Martin Morgan wrote:
> 
> > sapply calls lapply as
> >
> >    answer <- lapply(as.list(X), FUN, ...)
> >
> > which, when X is a list, causes X to be duplicated unnecessarily. The
> > coercion is unnecessary for other mode(X) because in lapply we have
> >
> >    if (!is.list(X)) X <- as.list(X)
> 
> That looks reasonable.

And you have made the change.  Unfortunately it is not really reasonable, 
as is.list(X) does not test that X is a list (see its documentation) in 
the same sense as as.list, so pairlists are now passed to the internal 
code.

There's something rather undesirable going on here.  The internal code for 
lapply (in its current version, not the one I wrote) does the internal 
equivalent of

    rval <- vector("list", length(X))
    for(i in seq(along = X))
        rval[i] <- list(FUN(X[[i]], ...))

from the earlier

lapply <- function(X, FUN, ...) {
    FUN <- match.fun(FUN)
    if (!is.list(X))
        X <- as.list(X)
    rval <- vector("list", length(X))
    for(i in seq(along = X))
        rval[i] <- list(FUN(X[[i]], ...))
    names(rval) <- names(X)               # keep `names' !
    return(rval)
}

so all that is needed is that X[[i]] work.

For a pairlist [[i]] done repeatedly is very inefficient (since it starts 
at the beginning each time), so we *do* want to coerce pairlists here.

On the other hand, we do not need to coerce expressions or atomic vectors 
for which [[]] works just fine.

> > More generally, perhaps as.vector might not duplicate when mode(x) == mode ?
> 
> This isn't a trivial change, because  mode(x)==mode does not guarantee 
> that as.vector(x, mode) has no effect.  For example, with mode="numeric" it 
> removes attributes.

And with mode="list" it does not (although that is not as documented).
We can certainly do better.  [This is another of those cases where 'mode' 
is confusing, and in fact it would be typeof(x) == mode.]

However, for now let us concentrate on as.list.default, which does

as.list.default <- function (x, ...)
{
    if (is.function(x))
	return(c(formals(x), list(body(x))))
    if (is.expression(x)) {
	n <- length(x)
	l <- vector("list", n)
	i <- 0
	for (sub in x) l[[i <- i + 1]] <- sub
	return(l)
    }
    .Internal(as.vector(x, "list"))
}

That's a bit strange, as an expression is internally a list, and it loses 
the names on the expression.  I intend to make as.list(x) return 
x unchanged if x is a list (not a pairlist), and to coerce expressions 
internally.

After that I will think about making as.vector and lapply make fewer 
copies.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list