[R] Effeciently sum 3d table

David A Vavra davavra at verizon.net
Mon Apr 16 21:40:45 CEST 2012


Bert,

My apologies on the name.

I haven't kept any data on loop times. I don't know why lapply seems faster
but the difference is quite noticeable. It has struck me as odd. I would
have thought lapply would be slower. It has taken an effort to change my
thinking to force fit solutions to it but I've gotten used to it. As of now
I reserve loops to times when there are only a few iterations (as in 10) and
to solutions that require passing large amounts of information among
iterations. lapply is particularly handy when constructing lists.

As for vectorizing, see the code below. Note that it uses mapply but that
simply may have made implementation easier. However, if vectorizing gives an
improvement over looping, the mapply may be the reason.

> f<-function(x,y,z) catn("do something")
> Vectorize(f,c('x','y'))
function (x, y, z) 
{
    args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
    names <- if (is.null(names(args))) 
        character(length(args))
    else names(args)
    dovec <- names %in% vectorize.args
    do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs =
list(args[!dovec]), 
        SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
}
<environment: 0x7fb3442553c8>

DAV


-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com] 
Sent: Monday, April 16, 2012 3:07 PM
To: David A Vavra
Cc: r-help at r-project.org
Subject: Re: [R] Effeciently sum 3d table

David:

1. My first name is Bert.

2. " It never occurred to me that there would be a question."
Indeed. But in fact you got solutions for two different
interpretations (Greg's is what you wanted). That is what I meant when
I said that clarity in asking the question is important.

3. > I have gotten the impression that a for loop is very inefficient.
Whenever I
> change them to lapply calls there is a noticeable improvement in run time
> for whatever reason.
I'd like to see your data on this. My experience is that they are
typically comparable. Chambers in his "Software for Data Analysis"
book says (pp 213): (with apply type functions rather than explicit
loops),  " The computation should run faster... However, none of the
apply mechanisms changes the number of times the supplied functions is
called, so serious improvements will be limited to iterating simple
calculations many times."

4. You can get serious improvements by vectorizing; and you can do
that here, if I understand correctly, because all your arrays have
identical dim = d. Here's how:

## assume your list of arrays is in listoftables

alldat <- do.call(cbind,listoftables) ## this might be the slow part
ans <- array(.rowSums (allDat), dim = d)

See ?rowSums for explanations and caveats, especially with NA's .

Cheers,
Bert

On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra <davavra at verizon.net> wrote:
> Thanks Gunter,
>
> I mean what I think is the normal definition of 'sum' as in:
>   T1 + T2 + T3 + ...
> It never occurred to me that there would be a question.
>
> I have gotten the impression that a for loop is very inefficient. Whenever
I
> change them to lapply calls there is a noticeable improvement in run time
> for whatever reason. The problem with lapply here is that I effectively
need
> a global table to hold the final sum. lapply also  wants to return a
value.
>
> You may be correct that in the long run, the loop is the best. There's a
lot
> of extraneous memory wastage holding all of the tables in a list as well
as
> the return 'values'.
>
> As an alternate and given a pre-existing list of tables, I was thinking of
> creating a temporary environment to hold the final result so it could be
> passed globally to each lapply execution level but that seems clunky and
> wasteful as well.
>
> Example in partial code:
>
> Env <- CreatEnv() # my own function
> Assign('final',T1-T1,envir=env)
> L<-listOfTables
>
> lapply(L,function(t) {
>        final <- get('final',envir=env) + t
>        assign('final',final,envir=env)
>        NULL
> })
>
> But I was hoping for a more elegant and hopefully more efficient solution.
> Greg's suggestion for using reduce seems in order but as yet I'm
unfamiliar
> with the function.
>
> DAV
>
>
>
> -----Original Message-----
> From: Bert Gunter [mailto:gunter.berton at gene.com]
> Sent: Monday, April 16, 2012 12:42 PM
> To: Greg Snow
> Cc: David A Vavra; r-help at r-project.org
> Subject: Re: [R] Effeciently sum 3d table
>
> Define "sum" . Do you mean you want to get a single sum for each
> array? -- get marginal sums for each array? -- get a single array in
> which each value is the sum of all the individual values at the
> position?
>
> Due thought and consideration for those trying to help by formulating
> your query carefully and concisely vastly increases the chance of
> getting a useful answer. See the posting guide -- this is a skill that
> needs to be learned and the guide is quite helpful. And I must
> acknowledge that it is a skill that I also have not yet mastered.
>
> Concerning your query, I would only note that the two responses from
> Greg and Petr that you received are unlikely to be significantly
> faster than just using loops, since both are still essentially looping
> at the interpreted level. Whether either give you what you want, I do
> not know.
>
> -- Bert
>
> On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538280 at gmail.com> wrote:
>> Look at the Reduce function.
>>
>> On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <davavra at verizon.net>
> wrote:
>>> I have a large number of 3d tables that I wish to sum
>>> Is there an efficient way to do this? Or perhaps a function I can call?
>>>
>>> I tried using do.call("sum",listoftables) but that returns a single
> value.
>>>
>>> So far, it seems only a loop will do the job.
>>>
>>>
>>> TIA,
>>> DAV
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
> atistics/pdb-ncb-home.htm
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
atistics/pdb-ncb-home.htm



More information about the R-help mailing list