[R] a problem of approach

Petr Savicky savicky at cs.cas.cz
Wed Jun 27 20:03:02 CEST 2012


On Wed, Jun 27, 2012 at 05:36:08PM +0300, Adrian Duşa wrote:
> Dear R-help list,
> 
> Part of a program I wrote seem to take a significant amount of time,
> therefore I am looking for an alternative approach.
> In order to explain what is does:
> 
> - the input is a sorted vector of integer numbers
> - some higher numbers may be derived (using a mathematical formula)
> from lower numbers, therefore they should be eliminated
> - at the end, the vector should contain only uniquely defined numbers
> 
> Pet hypothetical example, input vector:
> - 2 3 4 5 6 7 8 9 10
> - number 2 generates 4, 7, 10
> - 2 3 5 6 8 9 (surviving vector)
> - number 3 generates 5 and 9
> - 2 3 6 8 (surviving vector)
> - number 6 generates 8
> - final surviving vector 2 3 6
> 
> Function foo(x, ...) generates the numbers, my current approach being:
> ####
> index <- 0
> while ((index <- index + 1) < length(numbers)) {
>     numbers <- setdiff(numbers, foo(numbers[index]))
> }
> ####
> 
> This seem to take quite some time (but I don't know any other way of
> doing it), hence my question(s):
> - would there be another (quicker) implementation in R?
> - alternatively, should I go for a C implementation?
> 
> (actually, I did create a C implementation, but it doesn't bring any
> more speed... it is actually a bit slower).
> 
> A real-life pet example, using the function findSubsets() from the QCA
> package (our foo function above):
> 
> ####
> library(QCA)
> testfoo <- function(x, y) {
>     index <- 0
>     while((index <- index + 1) < length(x)) {
>         x <- setdiff(x, findSubsets(y, x[index], max(x)))
>     }
>     return(x)
> }
> 
> nofl <- rep(3, 14)
> set.seed(12345)
> numbers <- sort(sample(seq(prod(nofl)), 1000000))

Hi.

How large the numbers are? If the bound 3^14 = 4782969 used
above apply also to the real situation, then it is possible
to represent the set using a logical vector of this length,
which has TRUE for the numbers present in the set and FALSE
for the other. Removing a number means to set its bit from
TRUE to FALSE, which is a very simple operation.

  # prepare an example
  numbers <- c(2, 3, 6, 8, 12, 17)
  x <- rep(FALSE, times=max(numbers))
  x[numbers] <- TRUE
  which(x)

  [1]  2  3  6  8 12 17

  #remove 3, 8, 12 from the set
  x[c(3, 8, 12)] <- FALSE

  which(x)

  [1]  2  6 17
  
Hope this helps.

Petr Savicky.



More information about the R-help mailing list