[R] a problem of approach

Wed Jun 27 20:23:53 CEST 2012

On Wed, Jun 27, 2012 at 8:11 PM, jim holtman <jholtman at gmail.com> wrote:
> If you look, half of the time is spent in the 'findSubsets" function
> and the other half in determining where the differences are in the
> sets.  Is there a faster way of doing what findSubsets does since it
> is the biggest time consumer.  The setdiff might be speeded up by
> using 'match'.
>

That's right, Jim!
I have a C implementation of the findSubsets() ready, and put it to the test.

testfoo2 <- function(x, y) {
    mbase <- c(rev(cumprod(rev(y))), 1)[-1]
    index <- 0
    while((index <- index + 1) < length(x)) {
        x <- setdiff(x, .Call("fS", x, y, mbase, max(x)))
    }
    return(x)
}

> system.time(result2 <- testfoo2(numbers, nofl))
   user  system elapsed
  4.691   1.487   6.091

A decrease with about 40% (from the initial 10.148) ... that's very nice indeed.
match() however, didn't dramatically decrease the time:

testfoo3 <- function(x, y) {
    mbase <- c(rev(cumprod(rev(y))), 1)[-1]
    index <- 0
    while((index <- index + 1) < length(x)) {
        x <- x[is.na(match(x, .Call("fS", x, y, mbase, max(x))))]
    }
    return(x)
}

> system.time(result3 <- testfoo3(numbers, nofl))
   user  system elapsed
  4.304   1.359   5.621

However, your suggestions reduced the total time to almost a half,
which is fantastic.

The last question is related to the while() loop. All my R knowledge
tells me that loops are bound to be slow in R, therefore I wonder if
the while() loop can be avoided somehow, in this example.

Anyways, thanks a lot!
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
       +40 21 3120210 / int.101
Fax: +40 21 3158391