[R] counting the occurrences of vectors

Spencer Graves spencer.graves at pdf.com
Mon Jul 5 02:28:35 CEST 2004


      I see a case where "f1" gives the wrong answer: 

      b <- array(c("a:b", "a", "c", "b:c"), dim=c(2,2))
      a <- b[c(1,1),]

      For these two matrices, f1(a,b) == c(2,2), while f2(a,b) == 
c(2,0).  If b does not contain ":", e.g., if it is numeric, then this 
pathology can not occur.  However, if "f1" is used with objects of class 
character or string that could contain the "collapse" character, it 
could give an incorrect answer without warning. 

      hope this helps.  spencer graves

Ravi Varadhan wrote:

>Thanks to Gabor, Marc, and Spencer for their elegant solutions.  Gabor's first solution worked the best for me.
> 
>Best,
>Ravi.
>
>________________________________
>
>From: r-help-bounces at stat.math.ethz.ch on behalf of Gabor Grothendieck
>Sent: Sat 7/3/2004 12:12 PM
>To: r-help at stat.math.ethz.ch
>Subject: Re: [R] counting the occurrences of vectors
>
>
>
>Ravi Varadhan <rvaradha <at> jhsph.edu> writes:
>
>  
>
>>Hi:
>>
>>I have two matrices, A and B, where A is n x k, and B is m x k, where n >> m
>>    
>>
>>>k.  Is there a computationally fast way to
>>>      
>>>
>>count the number of times each row (a k-vector) of B occurs in A?  Thanks
>>    
>>
>for any suggestions.
>  
>
>>Best,
>>Ravi.
>>    
>>
>
>Here are two approaches.  The first one is an order of magnitude faster
>than the second.
>
>R> # test matrices
>R> set.seed(1)
>R> a <- matrix(sample(3,1000,rep=T),nc=5)
>R> b <- matrix(sample(3,100,rep=T),nc=5)
>
>R> f1 <- function(a,b) {
>+ a2 <- apply(a, 1, paste, collapse=":")
>+ b2 <- apply(b, 1, paste, collapse=":")
>+ c(table(c(a2,unique(b2)))[b2] - 1)
>+ }
>
>R> f2 <- function(a,b) {
>+ ta <- t(a)
>+ apply(b,1,function(x)sum(apply(ta == x,2,all)))
>+ }
>
>R> gc(); system.time(ans1 <- f1(a,b))
>         used (Mb) gc trigger (Mb)
>Ncells 458311 12.3     818163 21.9
>Vcells 124264  1.0     786432  6.0
>[1] 0.03 0.00 0.03   NA   NA
>
>R> gc(); system.time(ans2 <- f2(a,b))
>         used (Mb) gc trigger (Mb)
>Ncells 458312 12.3     818163 21.9
>Vcells 124270  1.0     786432  6.0
>[1] 0.1 0.0 0.1  NA  NA
>
>R> all.equal(ans1, ans2)
>[1] TRUE
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>  
>




More information about the R-help mailing list