[R] averaging a list of matrices element wise

Bert Gunter gunter.berton at gene.com
Tue Nov 6 16:26:49 CET 2012


Thierry:

(Apologies for beating a dead horse ...)

Just wanted to point out that sometimes it does not pay to try to be clever:

Try the **obvious, simple** solution using a for loop (where z is your
list of matrices):
> z1 <- 0
> for(i in seq_len(1e6))z1<- z1+z[[i]]
> z1<- z1/length(z)})

This was just about as fast as the more elegant Reduce() solution when
I tried it on my previous example of a million 10x10 matrices. (Note
that your OS and memory may affect this).

-- Bert





On Tue, Nov 6, 2012 at 5:45 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear all,
>
> Thanks a lot for your suggestions.
>
> Arun's suggestion of using simplify2array is only marginally faster than my attempt.
> The solutions of Dimitris and Bert however were 78 and 30 times faster than my attempt (using n = 51, s = 25, r = 1000 what will be about their size in my application).
>
> As there shouldn't be any missing data in the array, I'll stick to the solution based on Reduce() because I find it as well fast as elegant.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: Bert Gunter [mailto:gunter.berton at gene.com]
> Verzonden: maandag 5 november 2012 16:13
> Aan: D. Rizopoulos
> CC: ONKELINX, Thierry; r-help at r-project.org
> Onderwerp: Re: [R] averaging a list of matrices element wise
>
> Gents:
>
> Although it is difficult to say what may be faster, as it typically depends on the data,  and it is even more difficult to say what is fast enough, I suspect that
>
> ?rowMeans ## specifically written for speed
>
> would be considerably faster than Reduce (or an apply() )approach on the array), but I have **not** checked. I am of course prepared to eat my words in the face of data to the contrary.
>
> The call would be:
>
> result <- rowMeans( array(unlist(raw), dim = c(r,s,length(raw)), dims=2)
>
> Note that rowMeans() has an na.rm arguments to handle NA's. See the help file for deatils.
> Note also the tradeoff to memory, as copies of raw probably are made during evaluation.
> Finally note that dimnames are lost in the final result, so the above would have to be followed by
>
> dimnames(result) <- dimnames(raw[[1]])
>
> to get them back.
>
> -- Bert
>
>
> On Mon, Nov 5, 2012 at 2:43 AM, D. Rizopoulos <d.rizopoulos at erasmusmc.nl> wrote:
>> If you don't have any NAs, then one way is:
>>
>> n <- 3
>> r <- 5
>> s <- 6
>> raw <- lapply(seq_len(n), function(i){
>>    matrix(rnorm(r * s), ncol = r)
>> })
>>
>> Reduce("+", raw) / length(raw)
>>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> On 11/5/2012 11:32 AM, ONKELINX, Thierry wrote:
>>> Dear all,
>>>
>>> I have a list of n matrices which all have the same dimension (r x
>>> s). What would be a fast/elegant way to calculate the element wise
>>> average? So result[1, 1] <- mean(c(raw[[1]][1, 1] , raw[[2]][1, 1],
>>> raw[[...]][1, 1], raw[[n]][1, 1]))
>>>
>>> Here is my attempt.
>>>
>>> #create a dummy dataset
>>> n <- 3
>>> r <- 5
>>> s <- 6
>>> raw <- lapply(seq_len(n), function(i){
>>>    matrix(rnorm(r * s), ncol = r)
>>> })
>>>
>>> #do the calculation
>>> result <- array(dim = c(dim(raw[[1]]), length(raw))) for(i in
>>> seq_along(raw)){
>>>    result[,,i] <- raw[[i]]
>>> }
>>> result <- apply(result, 1:2, mean)
>>>
>>>
>>> Best regards,
>>>
>>> Thierry
>>>
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek / Research Institute for
>>> Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics &
>>> Quality Assurance Kliniekstraat 25
>>> 1070 Anderlecht
>>> Belgium
>>> + 32 2 525 02 51
>>> + 32 54 43 61 85
>>> Thierry.Onkelinx at inbo.be
>>> www.inbo.be
>>>
>>> To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
>>> ~ Sir Ronald Aylmer Fisher
>>>
>>> The plural of anecdote is not data.
>>> ~ Roger Brinner
>>>
>>> The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
>>> ~ John Tukey
>>>
>>> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * *
>>> * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
>>> The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Dimitris Rizopoulos
>> Assistant Professor
>> Department of Biostatistics
>> Erasmus University Medical Center
>>
>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> Tel: +31/(0)10/7043478
>> Fax: +31/(0)10/7043014
>> Web: http://www.erasmusmc.nl/biostatistiek/
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
> The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list