[Rd] BitSet equivalent? Java code usable?

Joachim Harloff joachim.harloff at joachimharloff.de
Mon Oct 10 21:24:53 CEST 2011


> If you don't have NAs in your bits, you might be able to use the raw
> vector type. A raw value is one byte (8 bits), doesn't support NAs and
> supports bit operations on all the bits at once:
> 
>   &, |, !, xor
> 
> You'd probably want to wrap it into a class that allows a more
> BitSet-like view of it though. Maybe there's a package out there that
> does that already?
> 
> ....and if you DO have NAs, you could use two raw vectors (or two
> consecutive raw-vector bits per BitSet bit).
> 
> Good luck!
> 
> /Tommy
> 
Hi Tommy and Uwe,

thank you. I thought about similar things. Moreover, there are dynamic size equivalents for BitSets available for C++ (e.g. from Boost. It should be possible to create similar code on one's own)(unfortunately I am ignorant about Assembler programming) and these could build the basis for a new type in R.
Yes, a two bit value still was fourfold efficient in memory usage than a byte. This would probably pay off more than fourfold in terms of CPU time due to much less cache related activity. But was it also more efficient in the number of processing steps involved? As you wrote it required creating a new class where each &, |, !, xor operation needs a redefinition involving (at least) one additional bitmask operation.
To answer this question I'd need more knowledge about the basics of R's basic types and memory allocation. Are they compiled in C? Fortran? Using the GNU compiler? How could I serialize? Sorry, I never cared about that. Maybe there is a good resource about it on the R web site.
Even more computation steps in terms of a choice between different bitmasks will arise from the question: What do we do about NAs? 
NA + 0 -> NA or NA + 0 -> 0 ?
NA + NA -> NA or NA + NA -> 0?
Does one NA in one bit field influence the values of other bit fields of the same BitSet(s)?
And so on. The truth table is going to have many interesting variants besides the obvious solution. 
What does this mean for the efficiency gains expected? Do they vanish?

Well at least now I know which questions to tackle next. Thank you. - It's just a hobby and will take considerable time to hatch.

Joachim



More information about the R-devel mailing list