[R] Frequency count of Boolean pattern in 4 vectors.

William Dunlap wdunlap at tibco.com
Sun Jun 2 06:37:09 CEST 2013


For 10 million data points
   table(interaction(vec_D, vec_C, vec_B, vec_A))
took my laptop 11.45 seconds and the following function required 0.18 seconds
   f0 <- function (vec_A, vec_B, vec_C, vec_D) 
  {
      x <- 1 + vec_A + 2 * (vec_B + 2 * (vec_C + 2 * vec_D))
      tab <- tabulate(x, nbins = 16)
      names(tab) <- do.call(paste0, rev(expand.grid(0:1, 0:1, 0:1, 
          0:1)))
      tab
  }
Aside from the order of the entries in the output tables, they gave the same results.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sridhar Iyer
> Sent: Saturday, June 01, 2013 2:57 PM
> To: r-help at r-project.org
> Subject: [R] Frequency count of Boolean pattern in 4 vectors.
> 
> I need to do this on very large datasets ( > a few million data points). So
> seeking help in figuring out an implementation of the task.
> 
> Input 4 vectors which contain values as 0 or 1. (as integers, not boolean
> bits)
> vec_A = ( 0, 1, 0, 0, ...... 1, 0, 1, 0)   etc
> vec_B = (0,0,1,1.....)
> vec_C, vec_D  (similar to above)
> All four vectors are same length.
> 
> I need to compute frequency count of the boolean literals for DCBA,
> DCBA
> 0000
> 0001
> 0010
> 0011
> ..
> ..
> 1111
> 
> Questions:
> a) Is there a mechanism for combining the 4 vectors (in integer formats)
> into 4 bits of a new vector or some other
> type? (or treat them as boolean values true/false instead of 0 or 1
> integers).
> b) what is the most efficient mechanism for obtaining the frequency count of
> each of the sixteen Boolean
> combinations?
> 
> I need to do this frequently on large datasets. So am trying to get an
> efficient implementation (instead of
> a quick and dirty scheme). Thank you very very much in advance.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list