[R] converting a for loop into a foreach loop

jim holtman jholtman at gmail.com
Fri Jan 20 03:25:03 CET 2012


You may not need foreach if you use the power of R and vectorize your
operations.  Instead of going through the loops and extracting subsets
at each iteration, use 'split' to either split the dataframe into the
subsets, or faster, create a set of indices to access the subsets of
data:

Here is one way of doing it; create a list of indices to split up the
data instead of doing it with 'for' loop and extracting dataframes at
each point (very time consuming).

> n <- 10000
> x <- data.frame(ID = sample(1:10, n, TRUE)
+               , day = sample(1:7, n, TRUE)
+               , hour1 = sample(0:23, n, TRUE)
+               , value = runif(n)
+               )
> # create list of indices to split the data
> idh <- split(seq(nrow(x)), list(x$ID, x$day, x$hour1), drop = TRUE)
> str(idh)  # sample of the indices
List of 1675
 $ 1.1.0  : int [1:5] 226 795 869 6617 9496
 $ 2.1.0  : int [1:11] 479 3483 3702 4660 4876 5373 5479 5960 6580 6956 ...
 $ 3.1.0  : int [1:9] 383 668 2437 3877 5290 5835 7003 7896 8905
 $ 4.1.0  : int [1:3] 1493 3502 9635
 $ 5.1.0  : int [1:2] 2480 6237
 $ 6.1.0  : int [1:5] 2061 4898 5288 9439 9692

> # now take the means of each set of indices
> imeans <- sapply(idh, function(i) mean(x$value[i]))
> head(imeans, 20)
    1.1.0     2.1.0     3.1.0     4.1.0     5.1.0     6.1.0     7.1.0
   8.1.0     9.1.0    10.1.0     1.2.0
0.6231298 0.2556291 0.4942764 0.5416091 0.9509064 0.4968711 0.4037645
0.4107976 0.4189220 0.5922433 0.5581944
    2.2.0     3.2.0     4.2.0     5.2.0     6.2.0     7.2.0     8.2.0
   9.2.0    10.2.0
0.6275555 0.6411061 0.4885817 0.5413741 0.4134971 0.4838082 0.5207435
0.4018991 0.5338913


On Thu, Jan 19, 2012 at 5:23 AM, kalee <kathryn.lee1 at students.mq.edu.au> wrote:
> Dear all,
>
> Just wondering if someone could help me out converting my code from a for()
> loop into a foreach() loop or using one of the apply() function. I have a
> very large dataset and so I'm hoping to make use of a parallel backend to
> speed up the processing time. I'm having trouble getting selecting three
> variables in the dataset to use in the foreach() loops. My for() loop code
> is:
>
> library(foreach)
> library(multicore)
> library(doMC)
> registerDoMC()
>
>
>> str(data)
> 'data.frame':   958 obs. of  13 variables:
>  $ Date.Time: Factor w/ 260 levels "03/07/09 00:00",..: 1 2 2 2 3 3 3 3 3 3
> ...
>  $ ID       : int  3 1 3 7 1 3 7 8 10 12 ...
>  $ X        : num  151 151 151 151 151 ...
>  $ Y        : num  -33.9 -33.9 -33.9 -33.9 -33.9 ...
>  $ Z        : num  8 8 8 12 8 8 10 8 8 4 ...
>  $ breeding : int  1 1 1 1 1 1 1 1 1 1 ...
>  $ hour     : int  0 0 0 0 0 0 0 0 0 0 ...
>  $ sex      : Factor w/ 4 levels "","F","M","U": 3 4 3 4 4 3 4 3 2 4 ...
>  $ sex.code : int  1 3 1 3 3 1 3 1 2 3 ...
>  $ day      : int  39997 39997 39997 39997 39997 39997 39997 39997 39997
> 39997 ...
>  $ hour1    : int  24 24 24 24 24 24 24 24 24 24 ...
>  $ X1       : num  1765688 1765492 1765492 1765637 1765383 ...
>  $ Y1       : num  -3834667 -3834964 -3834964 -3834786 -3834990 ...
>
>
> for (i in 1:15) {
>
> x = data[data$ID == i, 1:10]
>
>  for (j in 1:length(x$day)) {
>
>  y = x[x$day == j, 1:10]
>
>     for (k in 1:length(y$hour1)) {
>
>    z = y[y$hour1 == k, 1:10]
>
>
> H.scv <- Hscv(z, pilot = "unconstr")
>
> KDE <- kde(z, H=H.scv, approx.cont=TRUE)
> str(KDE)
> head(KDE)
>
> write.csv(KDE, file = paste("KDE",i j k,".csv"), row.names=T)
>
> }
> }
> }
>
> The foreach code I've tried (unsuccessfully) is:
>
> x <- foreach(a = data[, 'ID'], .combine = "rbind") %:% foreach(b = data[ ,
> 'day'], .combine = "cbind") %:% foreach[c = data['hour1'], .combine
> ="cbind"] %dopar% {
>
>
> H.scv <- Hscv((a,b,c), pilot = "unconstr")
>
> KDE <- kde((a,b,c), H=H.scv, approx.cont=TRUE)
> str(KDE)
> head(KDE)
>
> write.csv(KDE, file = paste("KDE",i,".csv"), row.names=T)
>
> }
>
> Many thanks for any help.
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/converting-a-for-loop-into-a-foreach-loop-tp4309646p4309646.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list