[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Felix Andrews felix at nfrac.org
Thu Jul 26 16:15:56 CEST 2007


Hi,

A dot in a regular expression matches any character, so you have to
escape each dot with backslash \\ (which itself is escaped in the
string, to confuse things...).
A plus symbol will match one or more of the preceding characters.
A dollar symbol will match the end of a string.

So:

gsub("\\.$", "", gsub("\\.+", ".", str))
[1] "y.m"        "BD.g.cm3"   "PR.Mpa"     "Ks.m.s"     "SP.g.g"
"P.m3.m3"    "theta1.g.g"
[8] "theta2.g.g" "AWC.g.g"

Learn more at ?regexp

Felix


On 7/26/07, 8rino-Luca Pantani <ottorino-luca.pantani at unifi.it> wrote:
> Dear R users,
> I have the following two problems, related to the function sub, grep,
> regexpr and similia.
>
> The header of the file(s) I have to import is like this.
>
> c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks (m/s)", "SP g./g.", "P
> (m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC (g/g)")
>
> To get rid of spaces and symbols in the names of the columns,
> I use read.table(... check.names=TRUE) and I get:
> str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.", "Ks..m.s.", "SP.g..g.",
> "P..m3.m3.", "theta1..g.g.", "theta2..g.g.", "AWC..g.g.")
>
> Now, my problem is to remove the trailing dots, as well as the double
> dots, in order to get the names like the following
> c("y.m", "BD.g.cm3", "PR.Mpa", "Ks.m.s", "SP.g.g", "P.m3.m3.",
> "theta1.g.g", "theta2.g.g", "AWC.g.g")
>
> I've searched the help pages for sub, regexpr and similia, and also
> searched the help archives.
> I understand that the dot is a peculiar sign since
> sub("..", ".", str)
> [1] "..m."        "...g.cm3."   "...Mpa."     "...m.s."     "..g..g."
> [6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g." ".C..g.g."
>
> Therefore I tried
> sub("\\..", ".", str)
> [1] "y.m."        "BD.g.cm3."   "PR.Mpa."     "Ks.m.s."     "SP...g."
> [6] "P.m3.m3."    "theta1.g.g." "theta2.g.g." "AWC.g.g."
> and I've been surprised by the (to me) strange behaviour in "SP.g..g."
> modified in "SP...g."
> An this is the first problem I cannot solve.
>
> Then there's the problem of trailing dot removal.
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
> I've found a somewhat similar problem, but it do not works in this case
> since:
> gsub("[.].*", "", str)
> [1] "y"      "BD"     "PR"     "Ks"     "SP"     "P"      "theta1" "theta2"
> [9] "AWC"
> And this the second problem
>
> Apart this particular problems I would like to know more on regexp, sub
> and so on, since each time
> I have strings to manipulate, I must face my ignorance in the topic of
> regular expression and its syntax.
>
> Is there any page with examples, where I can improve my knowledge and
> stop being frustrated each time I have to manipulate strings?
>
> 8rino
>
> --
> Ottorino-Luca Pantani, Universit¨¤ di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
> OLPantani at unifi.it
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Felix Andrews / °²¸£Á¢
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
voice:+86_1051404394 (in China)
mobile:+86_13522529265 (in China)
mobile:+61_410400963 (in Australia)
xmpp:foolish.android at gmail.com
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8



More information about the R-help mailing list