[R] How to speed up multiple for loop over list of data frames

Waterman, DG (David) david.waterman at diamond.ac.uk
Wed Oct 17 18:18:10 CEST 2007


I agree. Avoid the lines like:
iv     = c( iv, min(i, j) )

I had code that was sped up by 70 times after fixing the size of my
output object before entering a loop.

Cheers
David 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Patrick Burns
Sent: 17 October 2007 15:57
To: jim holtman
Cc: r-help at r-project.org; Dieter Best
Subject: Re: [R] How to speed up multiple for loop over list of data
frames

I suspect the vast majority of time is because of growing objects.

Preallocate 'iv', 'jv', 'rho_sv' and 'rho_pv' to be their final length
and then subscript into them with their values.


Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

jim holtman wrote:

>First thing to do is to use Rprof (?Rprof) on a subset of your data to 
>see where time is being spent.  My guess is that most of it is in the 
>calls to 'cor' and if this is the case, they you have to figure out 
>some other algorithm.
>
>Also if these dataframes all contain numeric information, convert them 
>to matrices intially because the subsetting that you are doing on the 
>dataframe (e.g., alist[[p]][i,"v"]) can be very expensive.  The output 
>from Rprof will help determine what course of action you should take.
>
>On 10/16/07, Dieter Best <dieterbest_2000 at yahoo.com> wrote:
>  
>
>>Hi there,
>>
>> I have a multiple for loop over a list of data frames
>>
>> for ( i in 1:(N-1) ) {
>>   for ( j in (i+1):N ) {
>>       for ( p in 1:M ) {
>>           v_i[p]    = alist[[p]][i,"v"]
>>           v_j[p]    = alist[[p]][j,"v"]
>>       }
>>       rho_s = cor(v_i, v_j, method = "spearman")
>>       rho_p = cor(v_i, v_j, method = "pearson" )
>>       iv     = c( iv, min(i, j) )
>>       jv     = c( jv, max(i, j) )
>>       rho_sv = c( rho_sv, rho_s)
>>       rho_pv = c( rho_pv, rho_p)
>>   }
>>}
>>
>> N is of the order of 400, M about 800.
>>
>> This takes me an entire day basically. Is there anything I could do
to speed things up or is cor really that slow?
>>
>> -- D
>>
>>
>>
>>---------------------------------
>>
>>
>>       [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide 
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>    
>>
>
>
>  
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
<DIV><FONT size="1" color="gray">This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
</FONT></DIV> 



More information about the R-help mailing list