[R] bitwise addition

Uwe Ligges ligges at statistik.uni-dortmund.de
Tue May 16 09:45:35 CEST 2006


Nameeta Lobo wrote:

> Hello all
> 
> thank you very much for all your suggestions. I actually need binary
> representations. I tried all the methods that Marc,Jim and Charles have
> suggested and they ran fine(thanks a lot). I tried doing it then with 26 and 13
> and that's when the computer gave way. I just got a message with all three
> methods that a vector of .....Kb cannot be allocated. guess I will have to
> change the environment to allow for huge vector size allocation. How do I do that?


You should have *at least* 512Mb in your machine for the solution given 
by Charles C. Berry with the numbers given above, better a machine with 1Gb.

Uwe Ligges


> Thanks once again all of you.
> 
> Nameeta
> 
> 
> Quoting "Charles C. Berry" <cberry at tajo.ucsd.edu>:
> 
> 
>>Here is a solution that finds the 2042975 25-bit words with 9 bits 'on'
>>in under 5 seconds on my PC. It finds the 5200300 25-bit words with 12 
>>bits 'on' in 16 seconds.
>>
>>Perhaps, this is the solution for a combinatorics homework problem.
>>
>>Finding words with 9 bits on:
>>
>>
>>>hi.index <- 25
>>>n.index <- 9
>>>breaks <- lapply(1:n.index,function(x) choose(-1+1:hi.index,x))
>>>index <- seq(0,length=choose(hi.index,n.index))
>>>dec.index <- rep(as.integer(0),choose(hi.index,n.index))
>>>system.time(
>>
>>+             for (i in n.index:1){
>>+               pos <- findInterval(index,breaks[[i]])
>>+               index <- index - breaks[[i]][pos]
>>+               dec.index <- dec.index + 2^(pos-1)
>>+             }
>>+             )
>>[1] 4.30 0.48 4.78   NA   NA
>>
>>>dec.index[1:10] # decimal representation of first ten
>>
>>  [1]  511  767  895  959  991 1007 1015 1019 1021 1022
>>
>>>gc()
>>
>>           used (Mb) gc trigger  (Mb) max used  (Mb)
>>Ncells  180950  4.9     407500  10.9   350000   9.4
>>Vcells 5180889 39.6   14895582 113.7 15399445 117.5
>>
>>And finding words with 12 bits on:
>>
>>
>>>hi.index <- 25
>>>n.index <- 12
>>>breaks <- lapply(1:n.index,function(x) choose(-1+1:hi.index,x))
>>>index <- seq(0,length=choose(hi.index,n.index))
>>
>>dec.index <- rep(as.integer(0),choose(hi.index,n.index))
>>
>>>>system.time(
>>
>>+             for (i in n.index:1){
>>+               pos <- findInterval(index,breaks[[i]])
>>+               index <- index - breaks[[i]][pos]
>>+               dec.index <- dec.index + 2^(pos-1)
>>+             }
>>+             )
>>
>>[1] 13.97  2.36 16.33    NA    NA
>>dec.index[1:10]  # decimal representation of first ten
>>
>>> [1] 4095 6143 7167 7679 7935 8063 8127 8159 8175 8183
>>>gc()
>>
>>            used (Mb) gc trigger  (Mb) max used  (Mb)
>>Ncells   180953  4.9     407500  10.9   350000   9.4
>>Vcells 13074276 99.8   29645052 226.2 28675324 218.8
>>
>>
>>Chuck Berry
>>
>>
>>On Sun, 14 May 2006, Charles C. Berry wrote:
>>
>>
>>>On Sat, 13 May 2006, jim holtman wrote:
>>>
>>>
>>>>Using the algorithm in the reference
>>>>www.everything2.com/index.pl?node_id=449702 and the 'bitops' package,
>>
>>here
>>
>>>>is some code that will select the numeric values with a specified number
>>
>>of
>>
>>>>bits.  It takes less than 12 seconds to select from 21-bit numbers. If I
>>
>>had
>>
>>>>more memory, I would expect that for 25-bit numbers it would take about
>>
>>200
>>
>>>>seconds to compute the indices.
>>>>
>>>
>>>The approach on the 'everything2' page is decidely cool.
>>>
>>>Since a fixed number of bits is sought one can operate on the indices of
>>>the bits, rather than the words and bits per se.
>>>
>>>For example, selecting 2 bits from 4 bit words:
>>>
>>>res <- rep(as.double(0),choose(4,2))
>>>k <- 0
>>>for ( index.1 in (1):(3) )
>>>  for ( index.2 in (index.1+1):(4) ){
>>>    index <- c( index.1,index.2 )
>>>    k <- k + 1
>>>    res[k] <- sum( 2^(index-1) ) }
>>>
>>>gives
>>>
>>>>res
>>>
>>>[1]  3  5  9  6 10 12
>>>
>>>Of course, one could turn 'index' into a character representation of the
>>>bits instead of a decimal representation, if that was desired.
>>>
>>>
>>>To generalize this a little computing on the language helps. For 25 bit
>>>words choosing those with 9 bits turned on:
>>>
>>>##-----
>>>hi.index <- 25
>>>n.index <- 9
>>>for.template <- "for ( index.#X# in (#FROM#):(#TO#) )"
>>>for.text <- function(x)
>>>  { from.text <- if (x==1) "1" else paste("index.",x-1,"+1",sep="")
>>>    to.text <- as.character(hi.index-n.index+x)
>>>    sub("#X#",x,sub("#TO#",to.text,sub("#FROM#",from.text,for.template)))
>>>  }
>>>
>>>
>>>all.text <- paste( "k <- 0;",
>>>                  paste( sapply(1:n.index,for.text),collapse=" "),
>>>                  "{index <- c(",
>>>                  paste(sapply(1:n.index,function(x)
>>>paste("index",x,sep='.')),collapse=","),
>>>                  ");k <- k + 1; res[k] <- sum( 2^(index-1) ) }")
>>>res <- rep(as.double(0),choose(hi.index,n.index))
>>>
>>>system.time(eval( parse(text=all.text ) ))
>>>
>>>##-----
>>>
>>>On my 2GHz AMD 64 running on Windows XP, I get
>>>
>>>
>>>>system.time(eval( parse(text=all.text ) ))
>>>
>>>[1] 35.17  0.08 35.41    NA    NA
>>>
>>>>res[1:10]
>>>
>>> [1]    511    767   1279   2303   4351   8447  16639  33023  65791
>>
>>131327
>>
>>>Note that if ordered binary numbers are needed, an additional sort()
>>>is required - adding about 1/10 seconds:
>>>
>>>
>>>>system.time(print(sort(res)[1:10]))
>>>
>>> [1]  511  767  895  959  991 1007 1015 1019 1021 1022
>>>[1] 0.11 0.00 0.11   NA   NA
>>>
>>>The memory needed is modest, as you would expect:
>>>
>>>
>>>>gc()
>>>
>>>          used (Mb) gc trigger (Mb) max used (Mb)
>>>Ncells  178596  4.8     407500 10.9   407500 10.9
>>>Vcells 2115922 16.2    5621217 42.9  4397614 33.6
>>>
>>>Running this using
>>>
>>>
>>>text.bits <- function(x) {
>>>	tmp <- rep("0",hi.index)
>>>	tmp[hi.index+1-x] <- "1";paste(tmp,collapse="")}
>>>
>>>res <- rep("0000000000000000111111111",choose(hi.index,n.index))
>>>
>>>and substituting
>>>
>>>	res[k] <- text.bits(index)
>>>
>>>in place of "res[k] <- sum( 2^(index-1) )" gives:
>>>
>>>
>>>>system.time(eval( parse(text=all.text ) ))
>>>
>>>[1] 201.31   0.12 204.44     NA     NA
>>>
>>>>res[1:10]
>>>
>>> [1] "0000000000000000111111111" "0000000000000001011111111"
>>> [3] "0000000000000010011111111" "0000000000000100011111111"
>>> [5] "0000000000001000011111111" "0000000000010000011111111"
>>> [7] "0000000000100000011111111" "0000000001000000011111111"
>>> [9] "0000000010000000011111111" "0000000100000000011111111"
>>>
>>>>gc()
>>>
>>>          used (Mb) gc trigger (Mb) max used (Mb)
>>>Ncells 2221730 59.4    3307833 88.4  3307833 88.4
>>>Vcells 9266414 70.7   11894938 90.8 12374438 94.5
>>>
>>>And running this with the decimal representation for 'res' for 25 bit
>>>words choosing those with 12 bits turned on:
>>>
>>>
>>>>hi.index <- 25
>>>>n.index <- 12
>>>
>>>  .
>>>  .
>>>  .
>>>
>>>>system.time(eval( parse(text=all.text ) ))
>>>
>>>[1] 101.78   0.08 104.14     NA     NA
>>>
>>>
>>>Chuck
>>>
>>>[rest deleted]
>>>
>>>
>>>Charles C. Berry                        (858) 534-2098
>>>                                         Dept of Family/Preventive
>>
>>Medicine
>>
>>>E mailto:cberry at tajo.ucsd.edu	         UC San Diego
>>>http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717
>>>
>>>
>>>
>>>   [ Part 3.15: "Included Message" ]
>>>
>>
>>Charles C. Berry                        (858) 534-2098
>>                                          Dept of Family/Preventive
>>Medicine
>>E mailto:cberry at tajo.ucsd.edu	         UC San Diego
>>http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717
>>
>>
>>
>>
>>
> 
> 
> 
> 
> 
> 
> -------------------------------------------------
> This email is intended only for the use of the individual or...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list