[R] Parsing a Simple Chemical Formula

Bryan Hanson hanson at depauw.edu
Mon Dec 27 00:29:52 CET 2010


Hello R Folks...

I've been looking around the 'net and I see many complex solutions in  
various languages to this question, but I have a pretty simple need  
(and I'm not much good at regex).  I want to use a chemical formula as  
a function argument.  The formula would be in "Hill order" which is to  
list C, then H, then all other elements in alphabetical order.  My  
example will have only a limited number of elements, few enough that  
one can search directly for each element.  So some examples would be  
C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or  
Br, there is no following number meaning a 1 is implied).

Let's say

 > form <- "C5H11BrO"

I'd like to get the count of each element, so in this case I need to  
extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the  
molecular weight by mulitplying).  Sounds pretty simple, but my  
experiments with grep and strsplit don't immediately clue me into an  
obvious solution.  As I said, I don't need a general solution to the  
problem of calculating molecular weight from an arbitrary formula,  
that seems quite challenging, just a way to convert "form" into a list  
or data frame which I can then do the math on.

Here's hoping this is a simple issue for more experienced R users!   
TIA,  Bryan
***********
Bryan Hanson
Professor of Chemistry & Biochemistry



More information about the R-help mailing list