[Rd] binary string conversion to a vector (PR#14120)

Franc Brglez brglez at ncsu.edu
Thu Dec 10 23:11:52 CET 2009

Please accept my sincere apologies for annoying the R development team with my post this week. If I were required to register as "a developer" before submission, this would not have happened. To rehabilitate myself, please find at the bottom of this mail two R-functions, 'string2vector' and 'vector2string', with "comments and tests". Both functions may go a long way towards assisting a number of R-users to make their R-programming more productive. I am a novice R-programmer: I started dabbling in R less than two months ago, heavily influenced by examples of code I see, including within the R.org documents (monkey does what monkey sees). Before posting two functions, I would really appreciate constructive edits where they may be needed as well as their posting by someone-in-the-know so there will be conveniently accessible for R users.

I am very impressed with potential of R and the community supporting it. I just wish I got to R sooner: I am looking to R to better support my work in "designed experiments to assess the statistically significant performance of combinatorial optimization algorithms on instance isomorphs of NP-hard problems" -- for better context of this mouthful, see the few postings under
I am working on a tutorial paper where I expect R to play a significant role in better explaining and illustrating, code-wise and graphically, the concepts discussed in the publications above. I would welcome a co-author with experience in R-programming as well as statistics and interests in the experimental methods addressed in these publications.

As I elaborate in notes that follow, I was looking at a variety of "R-documents" before my "bug" submission. I would appreciate very much if some of you could take the time to scan through these notes and respond briefly with useful pointers. Here are the headlines:

    (1) why I still think there may be a bug with 'noquote' vs 'as.integer'

    (2) search on "split string" and "join string"; the missing package "stringr"

    (3) a take on "Tcl" commands 'split', 'join', 'string', 'append', 'foreach'

    (4) a take on "R" functions 'string2vector' and 'vector2string'

    (5) code and comments for "R" functions 'string2vector' and 'vector2string

(1) why I still think there may be a bug with 'noquote' vs 'as.integer'
> # MacOSX 10.6.2, R 2.9.1 GUI 1.28 Tiger build 32-bit (5444)
> qvector
[1] "0" "0" "0" "1" "1" "0" "1"
> qvector[1]
[1] "0"
> tmp = noquote(qvector[1])
> tmp
[1] 0
> tmp = as.integer(qvector[1])
> tmp
[1] 0
When embedded in the function as per my "bug" report, 'noquote' and 'as.integer' are no longer equivalent whereas in the example above they appear to be equivalent!! I submitted the "function" with print/cat statements for sake of illustration.

(2) search on "split string" and "join string"; the missing package "stringr"
http://search.r-project.org/ reveals
   orderof 850 messages for search on "split string"
   orderof 160 messages for search on "join string"

http://finzi.psych.upenn.edu/search.html reveals
    for search on "split string"
   	• Rhelp08:   [ split: 890 ] [ string: 1676 ] [ TOTAL: 77 ]
        • functions: [ split: 954 ] [ string: 6453 ] [ TOTAL: 204 ]
    for search on "join string"
	• Rhelp08:   [ join: 176 ] [ string: 1676 ] [ TOTAL: 8 ]
	• functions: [ join: 192 ] [ string: 6453 ] [ TOTAL: 36 ]
    This site also provides a link to the package "stringr"
However, the download does not deliver ...
> install.packages("stringr")
   package ‘stringr’ is not available

There are a lot of hard-to-understand and not-so-relevant code snippets in all these 1000's of postings. I would argue that had robust functions such as 'string2vector' and 'vector2string' been included in the R-package, many R-programmers could take longer vacations, spend their time more productively,
and significantly reduce duplication of coding efforts on basically the same

Since vector is such and important "primitive" in R, I argue that functions such as 'string2vector' and 'vector2string' should be made to play a role similar to commands 'split', 'join', 'string', and 'append' that support programmers in Tcl. See my take on Tcl in the section below.

(3) a take on "Tcl" commands 'split', 'join', 'string', 'append', 'foreach'
I have been using Tcl to "wrap" a number of combinatorial solvers and automate workflows that implement and execute a number of my experiments on instance isomorphs. I even used Tcl to prototype few combinatorial optimization algorithm prototypes and write code for statistical analysis -- as task for which I now find R much better suited.

I intend to alert my Tcl colleagues in-the-know about the wonderful infrastructure provided in R when it comes to the R-shell (at least under MacOSX), and the ability to name and initialize function variable defaults explicitly, and the ability to install new packages so transparently. Before coming across R, I already took the trouble to create Tcl wrapper programs with command lines that feature identical order-indepent syntax as the syntax used in R. This being said, what I miss about R is gathering all commands on a single page such as
Note that once you click on any of the commands, a number of classes that extend each command become visible, including the example section(s). 

Here I illustrate my use of just five tcl commands that subsequently guided my "design" of the function 'string2vector' in 'vector2string' "R"

# few "Tcl" examples before designing the function 'string2vector' in "R"
% set binS "10011"
% join [split $binS ""] ", "
1, 0, 0, 1, 1
% set strS "I \t am\tdone" 
% foreach item [split $strS "\t"] {append strSQ \"$item\",}
% set strSQ [string trimright $strSQ ,]
"I "," am","done"
# few "Tcl" examples before designing the function 'vector2string' in "R"
% set strV "1,0,0,1"
% split $strV ","
1 0 0 1
join [split $strV ","] ":"

(4) a take on "R" functions 'string2vector' and 'vector2string'
> # few tests of the function 'string2vector' in "R"
> binS = "10011"
> binV = string2vector(binS, SS="", type="int")
> binV[2] ; binV[5]
[1] 0
[1] 1
> strS = "I am done" 
> vecS = string2vector(strS, SS=" ", type="char")
> vecS[1] ; vecS[3]
[1] "I"
[1] "done"
> # few tests of the function 'vector2string' in "R"
> binV = c(1,0,0,1) 
> vector2string(binV, type="int")
[1] "1001"
> vector2string(binV, SS=" ", type="char")
[1] "1 0 0 1"
> subsV = c("I", "am", "done")  
> vector2string(subsV, SS=":", type="char")
[1] "I:am:done"

(5) code and comments for "R" functions 'string2vector' and 'vector2string'

string2vector = function(string="ch-2 \t sec-7\tex-5", SS="\t", type="char")
# This procedure splits a string and assigns substrings to an R-vector.
# The split is controlled by the string separator SS (default value:  SS="\t").
# Here we convert  a binary string into a binary vector:
#   let  binS = "10011"  
#   then binV = string2vector(binS, SS="", type="int")
# Here we convert a string into a vector of substrings:
#   let  strS = "I am done" 
#   then vecS = string2vector(strS, SS=" ", type="char")
# LIMITATION: The function interprets all substrings either as of type 
#             "int" or "char".  A function that interprets the type of each
#             substring dynamically may one day be written by an R-guru.
# Franc Brglez, Wed Dec  9 14:19:16 EST 2009
    qlist   = strsplit(string, SS) ; qvector = qlist[[1]]
    n = length(qvector) ; xvector = NULL
    for (i in 1:n) {
        if (type == "int") {
            tmp = as.integer(qvector[i])
        } else {
            tmp = qvector[i]
	xvector = c(xvector, tmp)
} # string2vector

vector2string = function(vector=c("ch-2", "sec-7", "ex-5"), SS="_", type="char") 
# This procedure converts values from a vector to a concatenation of substrings 
# separated by user-specified string separator SS (default value:  SS="_").
# Each substring represents a vector component value, either as a numerical 
# value or as an alphanumeric string. 
# Here we convert a binary vector to a binary string representing an integer:
#   let  binV = c(1,0,0,1)  
#   then strS = vector2string(binV, type="int")
# Here we convert a binary vector to string representing a binary sequence:
#   let  binV = c(1,0,0,1)  
#   then seqS = vector2string(binV, SS=" ", type="char")
# Here we convert a vector of substrings to colon-separated string:
#   let subsV = c("I", "am", "done")  
#   then strS = vector2string(subsV, SS=":", type="char")
# LIMITATION: The function interprets all substrings in the vector either as of 
#             type "int" or "char".  A function that interprets the type of each
#             substring dynamically may one day be written by an R-guru.
# Franc Brglez, Wed Dec  9 15:43:59 EST 2009
    if (type == "int") {
        string = paste(strsplit(paste(vector), " "), collapse="")
    } else {
        n = length(vector) ; nm1 = n-1 ; string = ""
        for (i in 1:nm1) {
            tmp    = noquote(vector[i])
            string = paste(string, tmp, SS, sep="")
        tmp    = noquote(vector[n])
        string = paste(string, tmp, sep="")     
} # vector2string

