[R] numbers as part of long character

Charilaos Skiadas cskiadas at gmail.com
Fri Jun 13 00:03:22 CEST 2008


On Jun 12, 2008, at 5:06 PM, Marc Schwartz wrote:

> on 06/12/2008 03:46 PM Hua Li wrote:
>> Hi,
>> I'm looking for some way to pick up the numbers which are  
>> contained and buried in a long character. For example,
>> outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C: 
>> 17139.21);"
>> num.char = unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist 
>> (strsplit(unlist(strsplit 
>> (outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",f 
>> ixed=TRUE)),";",fixed=TRUE))
>> num.vec=as.numeric(num.char[1:(length(num.char)-1)])
>> num.char
>> #  "B"        "1204.25"  "E"        "1204.25"  "7581.11"   
>> "F"        "8785.36"  "8353.85"  "C"        "17139.21" "" num.vec
>> # NA  1204.25       NA  1204.25  7581.11       NA  8785.36   
>> 8353.85       NA 17139.21
>> would help me get the numbers such as 1204.25, 7581.11, etc, but  
>> with a warning message which reads:
>> "Warning message:
>> NAs introduced by coercion "
>> Is there a way to get around this? Thanks!
>> Hua
>
> Your code above is overly and needlessly complicated, which makes  
> it difficult to debug.
>
> I would take an approach whereby you use gsub() to strip non- 
> numeric characters from the input character vector and then use scan 
> () to read the remaining numbers:
>
> > Vec <- scan(textConnection(gsub("[^0-9\\.]+", " ", outtree.new)))
> Read 6 items
>
> > Vec
> [1]  1204.25  1204.25  7581.11  8785.36  8353.85 17139.21
>
> > str(Vec)
>  num [1:6] 1204 1204 7581 8785 8354 ...
>
>
> The result of using gsub() above is:
>
> > gsub("[^0-9\\.]+", " ", outtree.new)
> [1] " 1204.25 1204.25 7581.11 8785.36 8353.85 17139.21 "
>
>
> That gives you a character vector which can then be passed to scan 
> () as a textConnection().

Another approach would be to split on sequences of non-integers:

as.numeric( strsplit(outtree.new, "[^\\d.]+", perl=TRUE)[[1]] )


Use "[^+-\\d.]+" if your numbers might be signed. This does assume  
that dots, +/- occur only as decimal points.

Hua, did you want to keep the information of which number is B, which  
is C etc?

> See ?gsub, ?regex, ?textConnection and ?scan for more information.
>
> HTH,
>
> Marc Schwartz
>

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



More information about the R-help mailing list