[R] Reducing execution time

Wed Jul 27 17:32:17 CEST 2016

Hi,

It's really a good idea to use dput() or some other reproducible way
to provide data. I had to guess as to what your data looked like.

It appears that order doesn't matter?

Given than, here's one approach:

combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L,
34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))

dat <- list(
c(77,65,34,23,55),
c(65,23,77,65,55,34),
c(77,34,65),
c(55,78,56),
c(98,23,77,65,34))

sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat,
function(j)all(combs[i,] %in% j))))

On a dataset of comparable time to yours, it takes me under a minute and a half.

> combs <- combs[rep(1:nrow(combs), length=100), ]
> dat <- dat[rep(1:length(dat), length=10000)]
>
> dim(combs)
[1] 100   3
> length(dat)
[1] 10000
>
> system.time(test <- sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j)))))
   user  system elapsed
 86.380   0.006  86.391

On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivibish at gmail.com> wrote:
> Hi,
>
> Apologizes for the less information.
>
> Basically, myCombos is a matrix with 3 variables which is a triplet that is
> a combination of 79 codes. There are around 3lakh combination as such and
> it looks like below.
>
> V1 V2 V3
> 65 23 77
> 77 34 65
> 55 34 23
> 23 77 34
> 34 65 55
>
> Each triplet will compare in a list (mylist) having 8177 elements which
> will looks like below.
>
> 77,65,34,23,55
> 65,23,77,65,55,34
> 77,34,65
> 55,78,56
> 98,23,77,65,34
>
> Now I want to count the no of occurrence of the triplet in the above list.
> I.e., the triplet 65 23 77 is seen 3 times in the list. So my output looks
> like below
>
> V1 V2 V3 Freq
> 65 23 77  3
> 77 34 65  4
> 55 34 23  2
>
> I hope, I made it clear this time.
>
>
> On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>
>> Not entirely sure I understand, but match() is already vectorized, so you
>> should be able to lose the supply(). This would speed things up a lot.
>> Please re-read ?match *carefully* .
>>
>> Bert
>>
>> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivibish at gmail.com> wrote:
>>
>> Hi,
>>
>> I created list of 3 combination numbers (mycombos, around 3 lakh
>> combinations) and counting the occurrence of those combination in another
>> list. This comparision list (mylist) is having around 8000 records.I am
>> using the following code.
>>
>> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) {
>>   sum(sapply(myList, function(j) {
>>     sum(!is.na(match(c(myCombos[i,]), j)))})==3)})
>>
>> The above code takes very long time to execute and is there any other
>> effecting method which will reduce the time.
>> --
>>
>> Regards,
>> Srivathsan.K
>>