[R] Avoiding loops

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 2 18:17:01 CEST 2009


Alexander Shenkin wrote:
> Though, from my limited understanding, the 'apply' family of functions
> are actually just loops.  Please correct me if I'm wrong.  So, while
> more readable (which is important), they're not necessarily more
> efficient than explicit 'for' loops.

Hi Allie -- This uses an R-level loop (and a lot of C loops!), but the
length of the loop is only as long as the maximum lag


f0 <- function(df0, max_lag)
{
    max_lag <- min(nrow(df0), max_lag)
    a <- df0[[1]]
    ans <- df <- df0[,-1, drop=FALSE]
    for (lag in seq_len(max_lag)) {
        idx <- diff(a, lag) <= max_lag
        pad <- logical(lag)
        ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),]
    }
    cbind(a, ans)
}

it makes the assumption that 'a' is sorted and unique, as in a time
series. This

f1 <- function(df0, max_lag)
{
    max_lag <- min(nrow(df0), max_lag)
    a <- df0[[1]]
    ans <- df0[,-1, drop=FALSE]
    lag <- 1
    while(sum(idx <- diff(a, lag) <= max_lag) != 0) {
        pad <- logical(lag)
        ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),]
        lag <- lag + 1
    }
    cbind(a, ans)
}

relaxes the assumption that 'a' is unique, I think, but I haven't tested
carefully; it seems to perform about the same as f0. I think there's a
clever recursive solution in there, too.

This is my implementation of Phil's solution

phil0 <- function(df0, max_lag)
{
    with(df0, {
        g <- function(x)
            apply(df0[a - x >= -max_lag & a - x <= 0, c('b','c')],
                  2, sum)
        data.frame(a, t(sapply(a, g)))
    })
}

Here's my implementation of Chuck Berry's solution

chuck0 <- function(df0, max_lag)
{
    criterion <-
        as.matrix(dist(df0$a)) <= max_lag & outer(df0$a,df0$a,">=")
    criterion %*% as.matrix(df0[, c("b","c")])
}

Here's a data generator

setup <- function(n, m)
    ## n: number of rows
    ## m: expected counts per sum
{
    a0 <- sort(sample(seq_len(m * n), n))
    data.frame(a=a0, b=as.integer(runif(n, 1, 10)),
               c=as.integer(runif(n, 1, 10)))
}

and a comparison with

df0 <- setup(10^3, 3)
max_lag <- 5

> system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE)
   user  system elapsed
  0.016   0.000   0.016
> system.time(phil0res <- phil0(df0, max_lag), gcFirst=TRUE)
   user  system elapsed
  0.960   0.000   0.962
> system.time(chuck0res <- chuck0(df0, 5), gcFirst=TRUE)
   user  system elapsed
  0.252   0.000   0.254

> all.equal(f0res, phil0res)
[1] TRUE

> all.equal(as.matrix(f0res[,2:3]), chuck0res, check.attributes=FALSE)
[1] TRUE

The f0 solution seems to be usable up to about a million rows,

> df0 <- setup(10^6, 3)
> system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE)
   user  system elapsed
  2.680   0.004   2.700

Martin

> 
> allie
> 
> On 9/2/2009 3:13 AM, Phil Spector wrote:
>> Here's one way (assuming your data frame is named dat):
>>
>>    with(dat,
>>         data.frame(a,t(sapply(a,function(x){
>>                        apply(dat[a - x >= -5 & a - x <=
>> 0,c('b','c')],2,sum)}))))
>>
>>
>>                     - Phil Spector
>>                      Statistical Computing Facility
>>                      Department of Statistics
>>                      UC Berkeley
>>                      spector at stat.berkeley.edu
>>
>>
>>
>> On Tue, 1 Sep 2009, dolar wrote:
>>
>>> Would like some tips on how to avoid loops as I know they are slow in R
>>>
>>> i've got a data frame :
>>>
>>> a  b  c
>>> 1  5  2
>>> 4  6  9
>>> 5  2  3
>>> 8  3  2
>>>
>>> What i'd like is to sum for each value of a, the sum of b and the sum
>>> of c
>>> where a equal to or less than (with a distance of 5)
>>>
>>> i.e. for row three
>>> we have a=5
>>> i'd like to sum up b and sum up c with the above rule
>>> since 5, 4 and 1 are less than (within a distance of 5) or equal to
>>> 5, then
>>> we should get the following result:
>>>
>>> a  b   c
>>> 5  13  14
>>>
>>> the overall result should be
>>> a   b   c
>>> 1   5   2
>>> 4   11  11
>>> 5   13  14
>>> 8   11  14
>>>
>>> how can i do this without a loop?
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Avoiding-loops-tp25251376p25251376.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list