[R] Best way to test for numeric digits?

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Wed Oct 18 17:26:27 CEST 2023


В Wed, 18 Oct 2023 17:59:01 +0300
Leonard Mada via R-help <r-help using r-project.org> пишет:

> What is the best way to test for numeric digits?
> 
> suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
> # [1] NA NA NA  2 NA NA  3
> The above requires the use of the suppressWarnings function. Are
> there any better ways?

This test also has the downside of accepting things like "1.2" and
"+1e-100". Since you need digits only, why not use a regular expression
to test for '^[0-9]+$'?

> I was working to extract chemical elements from a formula, something 
> like this:

> split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))

Perhaps the following function could be made to work in your cases?

function(x) regmatches(x, gregexec('([A-Z][a-z]*)([0-9]*)', x))

retval[2,] is the element and retval[3,] is the coefficient. Do you
need brackets? Charges? Non-stoichiometric compounds? (SMILES?)

> # broken in R 4.3.1
> # only slightly "erroneous" with stringi::stri_split
> regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl =
> T)

strsplit() has special historical behaviour about empty matches:
https://bugs.r-project.org/show_bug.cgi?id=16745

It's unfortunate that it doesn't split on empty matches the way you
would intuitively expect it to, but changing the behaviour at this
point is hard. Even adding a flag may be complicated to implement. Do
you want such a flag?

-- 
Best regards,
Ivan



More information about the R-help mailing list