[R] numbers as part of long character

Marc Schwartz marc_schwartz at comcast.net
Thu Jun 12 23:06:41 CEST 2008

on 06/12/2008 03:46 PM Hua Li wrote:
> Hi,
> I'm looking for some way to pick up the numbers which are contained and buried in a long character. 
> For example,
> outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C:17139.21);"
> num.char = unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist(strsplit(outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",fixed=TRUE)),";",fixed=TRUE))
> num.vec=as.numeric(num.char[1:(length(num.char)-1)])
> num.char
> #  "B"        "1204.25"  "E"        "1204.25"  "7581.11"  "F"        "8785.36"  "8353.85"  "C"        "17139.21" "" 
> num.vec
> # NA  1204.25       NA  1204.25  7581.11       NA  8785.36  8353.85       NA 17139.21
> would help me get the numbers such as 1204.25, 7581.11, etc, but with a warning message which reads:
> "Warning message:
> NAs introduced by coercion "
> Is there a way to get around this? Thanks!
> Hua

Your code above is overly and needlessly complicated, which makes it 
difficult to debug.

I would take an approach whereby you use gsub() to strip non-numeric 
characters from the input character vector and then use scan() to read 
the remaining numbers:

 > Vec <- scan(textConnection(gsub("[^0-9\\.]+", " ", outtree.new)))
Read 6 items

 > Vec
[1]  1204.25  1204.25  7581.11  8785.36  8353.85 17139.21

 > str(Vec)
  num [1:6] 1204 1204 7581 8785 8354 ...

The result of using gsub() above is:

 > gsub("[^0-9\\.]+", " ", outtree.new)
[1] " 1204.25 1204.25 7581.11 8785.36 8353.85 17139.21 "

That gives you a character vector which can then be passed to scan() as 
a textConnection().

See ?gsub, ?regex, ?textConnection and ?scan for more information.


Marc Schwartz

More information about the R-help mailing list