[R] numbers as part of long character

Hua Li hualihua at yahoo.com
Fri Jun 13 00:34:25 CEST 2008


Thanks, Marc and Haris! 

I didn't know the values of the numbers beforehand, so the scan method won't work, but "[^+-\\d.]+" will do! 

And Haris, I didn't intend to keep the information of which number is B, which is C etc when asking the question, as I had a tedious way to do it (use strspilt and unlist over and over again, after I get the number). But if you have a easier way to do it, I'd like to know!
 
Hua


--- On Thu, 6/12/08, Charilaos Skiadas <cskiadas at gmail.com> wrote:

> From: Charilaos Skiadas <cskiadas at gmail.com>
> Subject: Re: [R] numbers as part of long character
> To: marc_schwartz at comcast.net
> Cc: hualihua at yahoo.com, r-help at r-project.org
> Date: Thursday, June 12, 2008, 6:03 PM
> On Jun 12, 2008, at 5:06 PM, Marc Schwartz wrote:
> 
> > on 06/12/2008 03:46 PM Hua Li wrote:
> >> Hi,
> >> I'm looking for some way to pick up the
> numbers which are  
> >> contained and buried in a long character. For
> example,
> >>
> outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C:
> 
> >> 17139.21);"
> >> num.char =
> unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist 
> >> (strsplit(unlist(strsplit 
> >>
> (outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",f
> 
> >> ixed=TRUE)),";",fixed=TRUE))
> >>
> num.vec=as.numeric(num.char[1:(length(num.char)-1)])
> >> num.char
> >> #  "B"        "1204.25" 
> "E"        "1204.25" 
> "7581.11"   
> >> "F"        "8785.36" 
> "8353.85"  "C"       
> "17139.21" "" num.vec
> >> # NA  1204.25       NA  1204.25  7581.11       NA 
> 8785.36   
> >> 8353.85       NA 17139.21
> >> would help me get the numbers such as 1204.25,
> 7581.11, etc, but  
> >> with a warning message which reads:
> >> "Warning message:
> >> NAs introduced by coercion "
> >> Is there a way to get around this? Thanks!
> >> Hua
> >
> > Your code above is overly and needlessly complicated,
> which makes  
> > it difficult to debug.
> >
> > I would take an approach whereby you use gsub() to
> strip non- 
> > numeric characters from the input character vector and
> then use scan 
> > () to read the remaining numbers:
> >
> > > Vec <-
> scan(textConnection(gsub("[^0-9\\.]+",
> " ", outtree.new)))
> > Read 6 items
> >
> > > Vec
> > [1]  1204.25  1204.25  7581.11  8785.36  8353.85
> 17139.21
> >
> > > str(Vec)
> >  num [1:6] 1204 1204 7581 8785 8354 ...
> >
> >
> > The result of using gsub() above is:
> >
> > > gsub("[^0-9\\.]+", "
> ", outtree.new)
> > [1] " 1204.25 1204.25 7581.11 8785.36 8353.85
> 17139.21 "
> >
> >
> > That gives you a character vector which can then be
> passed to scan 
> > () as a textConnection().
> 
> Another approach would be to split on sequences of
> non-integers:
> 
> as.numeric( strsplit(outtree.new,
> "[^\\d.]+", perl=TRUE)[[1]] )
> 
> 
> Use "[^+-\\d.]+" if your numbers might be
> signed. This does assume  
> that dots, +/- occur only as decimal points.
> 
> Hua, did you want to keep the information of which number
> is B, which  
> is C etc?
> 
> > See ?gsub, ?regex, ?textConnection and ?scan for more
> information.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> 
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College



More information about the R-help mailing list