[R] How to speed up multiple for loop over list of data frames

Bert Gunter gunter.berton at gene.com
Wed Oct 17 18:50:36 CEST 2007


... which is tip 2 in Section 7.7, "Tips," of V&R's S PROGRAMMING. Although
this is now somewhat dated, it is still worthwhile if you do any serious S
language programming (IMO, of course). 


Bert Gunter
Genentech Nonclinical Statistics


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Waterman, DG (David)
Sent: Wednesday, October 17, 2007 9:18 AM
To: Dieter Best
Cc: r-help at r-project.org
Subject: Re: [R] How to speed up multiple for loop over list of data frames

I agree. Avoid the lines like:
iv     = c( iv, min(i, j) )

I had code that was sped up by 70 times after fixing the size of my
output object before entering a loop.

Cheers
David 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Patrick Burns
Sent: 17 October 2007 15:57
To: jim holtman
Cc: r-help at r-project.org; Dieter Best
Subject: Re: [R] How to speed up multiple for loop over list of data
frames

I suspect the vast majority of time is because of growing objects.

Preallocate 'iv', 'jv', 'rho_sv' and 'rho_pv' to be their final length
and then subscript into them with their values.


Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

jim holtman wrote:

>First thing to do is to use Rprof (?Rprof) on a subset of your data to 
>see where time is being spent.  My guess is that most of it is in the 
>calls to 'cor' and if this is the case, they you have to figure out 
>some other algorithm.
>
>Also if these dataframes all contain numeric information, convert them 
>to matrices intially because the subsetting that you are doing on the 
>dataframe (e.g., alist[[p]][i,"v"]) can be very expensive.  The output 
>from Rprof will help determine what course of action you should take.
>
>On 10/16/07, Dieter Best <dieterbest_2000 at yahoo.com> wrote:
>  
>
>>Hi there,
>>
>> I have a multiple for loop over a list of data frames
>>
>> for ( i in 1:(N-1) ) {
>>   for ( j in (i+1):N ) {
>>       for ( p in 1:M ) {
>>           v_i[p]    = alist[[p]][i,"v"]
>>           v_j[p]    = alist[[p]][j,"v"]
>>       }
>>       rho_s = cor(v_i, v_j, method = "spearman")
>>       rho_p = cor(v_i, v_j, method = "pearson" )
>>       iv     = c( iv, min(i, j) )
>>       jv     = c( jv, max(i, j) )
>>       rho_sv = c( rho_sv, rho_s)
>>       rho_pv = c( rho_pv, rho_p)
>>   }
>>}
>>
>> N is of the order of 400, M about 800.
>>
>> This takes me an entire day basically. Is there anything I could do
to speed things up or is cor really that slow?
>>
>> -- D
>>
>>
>>
>>---------------------------------
>>
>>
>>       [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide 
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>    
>>
>
>
>  
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
<DIV><FONT size="1" color="gray">This e-mail and any attachments may contain
confidential, copyright and or privileged material, and are for the use of
the intended addressee only. If you are not the intended addressee or an
authorised recipient of the addressee please notify us of receipt by
returning the e-mail and do not use, copy, retain, distribute or disclose
the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and
not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for any
damage which you may sustain as a result of software viruses which may be
transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England
and Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
</FONT></DIV> 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list