[R] Numbers in a string

Petr Savicky savicky at cs.cas.cz
Thu Dec 16 17:42:29 CET 2010


On Thu, Dec 16, 2010 at 06:17:45AM -0800, Dieter Menne wrote:
> Petr Savicky wrote:
> > 
> > One of the suggestions in this thread was to use an external program.
> > A possible solution without negation in Perl is
> > 
> >   @a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
> >   print @a, "\n";
> >   15965465
> > 
> > 
> 
> Which is
> 
>  gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
> 
> as Henrique suggested.

I agree. The Perl code was a reply to a question, whether the same can be
done by describing the required elements and not by describing the ones to
be removed. This could be useful, if we want to extract elements described
by a more complex regular expression. A more accurate, although not
complete and definitely not the best, extraction of nonnegative numbers
in Perl may be done as follows

  @a = ("abcde. 11 abc 5.31e+34, (1.45)" =~ m/[0-9]+\.[0-9]+e[+-][0-9]+|[0-9]+\.[0-9]+|[0-9]+/g);
  print join(" ", @a), "\n";
  11 5.31e+34 1.45

Can something similar be done in R either specifically for numbers or
for a general regular expression?

Going back to the original question, the answer depends on the complexity of
extracting numbers in a concrete situation. If possible, using functions
within R is suggested (gsub(), strsplit(), ...). On the other hand, there
are cases, where an external tool can be helpful. See also R-intro
Chapter 7 Reading data from files, which says

  There is a clear presumption by the designers of R that you will be
  able to modify your input files using other tools, such as file editors
  or Perl to fit in with the requirements of R.

Petr Savicky.



More information about the R-help mailing list