[R] Effeciently sum 3d table

Bert Gunter gunter.berton at gene.com
Mon Apr 16 22:32:25 CEST 2012


David:

Here is a comparison of the gains to be made by vectorization (again,
assuming I have interpreted your query correctly)

## create a list of arrays
> z <- lapply(seq_len(10000),function(i)array(runif(24),dim=2:4))
## Using an apply type approach
> system.time(ans1 <- array(do.call(mapply,c(sum,z)),dim=2:4))
   user  system elapsed
   0.62    0.00    0.62
## vectorizing via rowSums and cbind
> system.time(ans2 <-array(rowSums(do.call(cbind,z)),dim=2:4))
   user  system elapsed
   0.02    0.00    0.02
> identical(ans1,ans2)
[1] TRUE

Cheers,
Bert



On Mon, Apr 16, 2012 at 1:19 PM, David A Vavra <davavra at verizon.net> wrote:
> Thanks Bill,
>
>
>
> For reasons that aren't important here, I must start from a list. Computing
> the sum while generating the tables may be a solution but it means doing
> something in one piece of code that is unrelated to the surrounding code.
> Bad practice where I'm from. If it's needed it's needed but if I can avoid
> doing so, I will.
>
>
>
> I haven't done any timing but because of the extra operations of get and
> assign, the non-loop implementation will likely suffer. It seems you have
> shown this to be true.
>
>
>
> DAV
>
>
>
>
>
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Monday, April 16, 2012 3:26 PM
> To: David A Vavra; 'Bert Gunter'
> Cc: r-help at r-project.org
> Subject: RE: [R] Effeciently sum 3d table
>
>
>
>> Example in partial code:
>
>>
>
>> Env <- CreatEnv() # my own function
>
>> Assign('final',T1-T1,envir=env)
>
>> L<-listOfTables
>
>>
>
>> lapply(L,function(t) {
>
>>     final <- get('final',envir=env) + t
>
>>     assign('final',final,envir=env)
>
>>     NULL
>
>> })
>
>
>
> First, finish writing that code so it runs and you can make sure its
>
> output is ok:
>
>
>
> L <- lapply(1:50000, function(i) array(i:(i+3), c(2,2))) # list of 50,000
> 2x2 matrices
>
> env <- new.env()
>
> assign('final', L[[1]] - L[[1]], envir=env)
>
> junk <- lapply(L, function(t) {
>
>     final <- get('final', envir=env) + t
>
>     assign('final', final, envir=env)
>
>     NULL
>
> })
>
> get('final', envir=env)
>
> #            [,1]       [,2]
>
> # [1,] 1250025000 1250125000
>
> # [2,] 1250075000 1250175000
>
>> sum( (2:50001) ) # should be final[2,1]
>
> # [1] 1250075000
>
>
>
> You asked for something less "clunky".
>
> You are fighting the system by using get() and assign(), just use
>
> ordinary expression syntax to get and set variables:
>
> final <- L[[1]]
>
> for(i in seq_along(L)[-1]) final <- final + L[[i]]
>
> final
>
> #           [,1]       [,2]
>
> # [1,] 1250025000 1250125000
>
> # [2,] 1250075000 1250175000
>
>
>
> The former took 0.22 seconds on my machine, the latter 0.06.
>
>
>
> You don't have to compute the whole list of matrices before
>
> doing the sum, just add to the current sum when you have
>
> computed one matrix and then forget about it.
>
>
>
> Bill Dunlap
>
> Spotfire, TIBCO Software
>
> wdunlap tibco.com
>
>
>
>
>
>> -----Original Message-----
>
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf
>
>> Of David A Vavra
>
>> Sent: Monday, April 16, 2012 11:35 AM
>
>> To: 'Bert Gunter'
>
>> Cc: r-help at r-project.org
>
>> Subject: Re: [R] Effeciently sum 3d table
>
>>
>
>> Thanks Gunter,
>
>>
>
>> I mean what I think is the normal definition of 'sum' as in:
>
>>    T1 + T2 + T3 + ...
>
>> It never occurred to me that there would be a question.
>
>>
>
>> I have gotten the impression that a for loop is very inefficient. Whenever
> I
>
>> change them to lapply calls there is a noticeable improvement in run time
>
>> for whatever reason. The problem with lapply here is that I effectively
> need
>
>> a global table to hold the final sum. lapply also  wants to return a
> value.
>
>>
>
>> You may be correct that in the long run, the loop is the best. There's a
> lot
>
>> of extraneous memory wastage holding all of the tables in a list as well
> as
>
>> the return 'values'.
>
>>
>
>> As an alternate and given a pre-existing list of tables, I was thinking of
>
>> creating a temporary environment to hold the final result so it could be
>
>> passed globally to each lapply execution level but that seems clunky and
>
>> wasteful as well.
>
>>
>
>> Example in partial code:
>
>>
>
>> Env <- CreatEnv() # my own function
>
>> Assign('final',T1-T1,envir=env)
>
>> L<-listOfTables
>
>>
>
>> lapply(L,function(t) {
>
>>     final <- get('final',envir=env) + t
>
>>     assign('final',final,envir=env)
>
>>     NULL
>
>> })
>
>>
>
>> But I was hoping for a more elegant and hopefully more efficient solution.
>
>> Greg's suggestion for using reduce seems in order but as yet I'm
> unfamiliar
>
>> with the function.
>
>>
>
>> DAV
>
>>
>
>>
>
>>
>
>> -----Original Message-----
>
>> From: Bert Gunter [mailto:gunter.berton at gene.com]
>
>> Sent: Monday, April 16, 2012 12:42 PM
>
>> To: Greg Snow
>
>> Cc: David A Vavra; r-help at r-project.org
>
>> Subject: Re: [R] Effeciently sum 3d table
>
>>
>
>> Define "sum" . Do you mean you want to get a single sum for each
>
>> array? -- get marginal sums for each array? -- get a single array in
>
>> which each value is the sum of all the individual values at the
>
>> position?
>
>>
>
>> Due thought and consideration for those trying to help by formulating
>
>> your query carefully and concisely vastly increases the chance of
>
>> getting a useful answer. See the posting guide -- this is a skill that
>
>> needs to be learned and the guide is quite helpful. And I must
>
>> acknowledge that it is a skill that I also have not yet mastered.
>
>>
>
>> Concerning your query, I would only note that the two responses from
>
>> Greg and Petr that you received are unlikely to be significantly
>
>> faster than just using loops, since both are still essentially looping
>
>> at the interpreted level. Whether either give you what you want, I do
>
>> not know.
>
>>
>
>> -- Bert
>
>>
>
>> On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538280 at gmail.com> wrote:
>
>> > Look at the Reduce function.
>
>> >
>
>> > On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <davavra at verizon.net>
>
>> wrote:
>
>> >> I have a large number of 3d tables that I wish to sum
>
>> >> Is there an efficient way to do this? Or perhaps a function I can call?
>
>> >>
>
>> >> I tried using do.call("sum",listoftables) but that returns a single
>
>> value.
>
>> >>
>
>> >> So far, it seems only a loop will do the job.
>
>> >>
>
>> >>
>
>> >> TIA,
>
>> >> DAV
>
>>
>
>>
>
>> --
>
>>
>
>> Bert Gunter
>
>> Genentech Nonclinical Biostatistics
>
>>
>
>> Internal Contact Info:
>
>> Phone: 467-7374
>
>> Website:
>
>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
>
>> atistics/pdb-ncb-home.htm
>
>>
>
>> ______________________________________________
>
>> R-help at r-project.org mailing list
>
>> https://stat.ethz.ch/mailman/listinfo/r-help
>
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list