[R] substituting dots in the names of the columns (sub, gsub, regexpr)

8rino-Luca Pantani ottorino-luca.pantani at unifi.it
Thu Jul 26 15:40:40 CEST 2007


Dear R users,
I have the following two problems, related to the function sub, grep, 
regexpr and similia.

The header of the file(s) I have to import is like this.

c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks (m/s)", "SP g./g.", "P 
(m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC (g/g)")

To get rid of spaces and symbols in the names of the columns,
I use read.table(... check.names=TRUE) and I get:
str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.", "Ks..m.s.", "SP.g..g.", 
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.", "AWC..g.g.")

Now, my problem is to remove the trailing dots, as well as the double 
dots, in order to get the names like the following
c("y.m", "BD.g.cm3", "PR.Mpa", "Ks.m.s", "SP.g.g", "P.m3.m3.", 
"theta1.g.g", "theta2.g.g", "AWC.g.g")

I've searched the help pages for sub, regexpr and similia, and also 
searched the help archives.
I understand that the dot is a peculiar sign since
sub("..", ".", str)
[1] "..m."        "...g.cm3."   "...Mpa."     "...m.s."     "..g..g."   
[6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g." ".C..g.g."  

Therefore I tried
sub("\\..", ".", str)
[1] "y.m."        "BD.g.cm3."   "PR.Mpa."     "Ks.m.s."     "SP...g."   
[6] "P.m3.m3."    "theta1.g.g." "theta2.g.g." "AWC.g.g."  
and I've been surprised by the (to me) strange behaviour in "SP.g..g." 
modified in "SP...g."
An this is the first problem I cannot solve.

Then there's the problem of trailing dot removal.
In
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
I've found a somewhat similar problem, but it do not works in this case 
since:
gsub("[.].*", "", str)
[1] "y"      "BD"     "PR"     "Ks"     "SP"     "P"      "theta1" "theta2"
[9] "AWC"   
And this the second problem

Apart this particular problems I would like to know more on regexp, sub 
and so on, since each time
I have strings to manipulate, I must face my ignorance in the topic of 
regular expression and its syntax.

Is there any page with examples, where I can improve my knowledge and 
stop being frustrated each time I have to manipulate strings?

8rino

-- 
Ottorino-Luca Pantani, Università di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 
OLPantani at unifi.it



More information about the R-help mailing list