[R] How to speed up nested for loop computations

Max Manfrin mmanfrin at ulb.ac.be
Thu Aug 10 22:10:53 CEST 2006


On 10 Aug 2006, at 18:46, jim holtman wrote:

> It appears that you are trying to partition the dataframe and then  
> do some operations.  It is probably better to use 'split' to  
> generate the set of indices of the partitions and then do the  
> operations on the subset.  Here is an example that calculate the  
> 'mean' of each partition:
>
> > n <- 20
> > x <- data.frame(id=sample(1:3,n,TRUE), type=sample(1:3,n,TRUE),  
> value=runif(n))
> > x.split <- split(1:nrow(x), list(x$id, x$type), drop=TRUE)
> > x.split
> $`3.1`
> [1]  1 15 19
>
> $`1.1`
> [1] 2
... cut ...

> > # calculate the number of values in the partition and their mean
>
> > lapply(x.split, function(z) c(length(z),mean(x$value[z])))
> $`3.1`
> [1] 3.0000000 0.3120459
>
> $`1.1`
> [1] 1.0000000 0.5642638
... cut ...
> You should be able to extend this approach to your data.

I tried to follow your suggestion. I indeed have to partition the  
data frame: my complete set of data contains for each problem  
instance ("instance") of a given size (the number of instances of a  
given size in the example is 2), for each search algorithm ("idalgo")  
(the number of algorithm I'm testing is 78), for each trial ("try")  
(I test each algorithm on each instance 30 times) all the best-so-far  
solutions value ("best") found by every CPU (my parallel algorithm  
runs on 8 CPU) during the duration of the search.

I therefore applied to the res data frame the command
 >res.split <- split(res, list(res$instance, res$try, res$idalgo),  
drop=TRUE)

For every partition (and I have 4680 partition of the type  
instance.try.idalgo) I need to identify the best solution found (so,  
among the 8 CPU I need to identify the one with the lowest value of  
"best"). Unluckly the split command doesn't give me back the indexes  
of the row of res data frame like in your example, but gives me a  
"subset" of the res, so I don't know how to write the lapply function  
to return the indexes of the rows in res containing the minimum value  
of best for the partitions.


I here give an example with a subset of the data:

 > optimal_values<-read.table("optimal_values_80.txt",header=TRUE)
 > resPIR2OPT<-read.table("parallel_independent_2- 
opt_80_800.txt",header=TRUE)
 > resSEQ2OPT<-read.table("sequential_2-opt_80_6400.txt",header=TRUE)
 > resSEQ22OPT<-read.table("sequential2_2-opt_80_800.txt",header=TRUE)
 >
 > res<-rbind(resPIR2OPT,resSEQ2OPT,resSEQ22OPT)
 > str(res)
`data.frame':	14774 obs. of  11 variables:
$ idalgo   : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1 1 1 1  
1 1 1 1 ...
$ topo     : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1  
1 ...
$ schema   : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1  
1 ...
$ ls       : int  2 2 2 2 2 2 2 2 2 2 ...
$ type     : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ cpu_id   : int  0 0 0 0 0 0 0 0 0 0 ...
$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1 1 1 1  
1 ...
$ try      : int  1 1 1 1 1 1 1 1 1 1 ...
$ best     : int  255289 255250 255209 255112 254991 254971 254969  
254897 254893 254892 ...
$ time     : num  0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72 1.78 1.93 ...
$ iteration: int  1 1 1 2 13 18 19 22 23 26 ...
 > res.split <- split(res, list(res$instance, res$try, res$idalgo),  
drop=TRUE)
 > str(res.split)
List of 180
$ lipa80a.1.PIR-2opt  :`data.frame':	184 obs. of  11 variables:
   ..$ idalgo   : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1  
1 1 1 1 1 1 1 ...
   ..$ topo     : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ schema   : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ ls       : int [1:184] 2 2 2 2 2 2 2 2 2 2 ...
   ..$ type     : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1  
1 ...
   ..$ cpu_id   : int [1:184] 0 0 0 0 0 0 0 0 0 0 ...
   ..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ try      : int [1:184] 1 1 1 1 1 1 1 1 1 1 ...
   ..$ best     : int [1:184] 255289 255250 255209 255112 254991  
254971 254969 254897 254893 254892 ...
   ..$ time     : num [1:184] 0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72  
1.78 1.93 ...
   ..$ iteration: int [1:184] 1 1 1 2 13 18 19 22 23 26 ...
$ lipa80a.2.PIR-2opt  :`data.frame':	230 obs. of  11 variables:
   ..$ idalgo   : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1  
1 1 1 1 1 1 1 ...
   ..$ topo     : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ schema   : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ ls       : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
   ..$ type     : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1  
1 ...
   ..$ cpu_id   : int [1:230] 0 0 0 0 0 0 0 0 0 0 ...
   ..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1  
1 1 1 1 ...
   ..$ try      : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
   ..$ best     : int [1:230] 255557 255264 255235 255201 255193  
255192 255186 255103 254990 254971 ...
   ..$ time     : num [1:230] 0.09 0.09 0.19 0.19 0.37 1.29 1.36 1.36  
1.58 1.89 ...
   ..$ iteration: int [1:230] 1 1 2 2 4 15 16 16 19 24 ...


My question now is: how do I extract from each partition the row with  
the minimal best value? I need to boxplot them.

Thanks again in advance for any help anybody could give.

----
Max MANFRIN
http://iridia.ulb.ac.be/~mmanfrin/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20060810/40afd2e7/attachment.bin 


More information about the R-help mailing list