[R] regular expression for selection

Petr PIKAL petr.pikal at precheza.cz
Mon Nov 14 11:27:44 CET 2011


Hi

Thank you. It is a pure magic, something taught in Unseen University.

this is what I got as a help for selecting only letters from set of 
character vector.

> vzor
 [1] "61A"     "62C/27"  "65A/27"  "66C/29"  "69A/29"  "70C/31"
"73A/31" 
 [8] "74C/33"  "77A/33"  "81A/35"  "82C/37"  "85A/37"  "86C/39"
"89A/39" 
[15] "90C/41"  "93A/41"  "94C/43"  "97A/43"  "98C/45"  "101A/45"
"102C/47"
[22] "105A/47" "106C/49" "109A/49" "110C/51" "113A/51"

> gsub("[^A-z]", "", vzor)
 [1] "A" "C" "A" "C" "A" "C" "A" "C" "A" "A" "C" "A" "C" "A" "C" "A" "C"
[18] "A" "C" "A" "C" "A" "C" "A" "C" "A"

Therefore I expected that

sub("m5.", "\\1", mena) or sub("m5.", "", mena)

selects what I wanted. But it was not the case.

Please can you correct me when I try to evaluate your solution?

gsub(".*_(m5.).*", "\\1", mena)

or

gsub(".*(m5.).*", "\\1", mena)

.* matches any characters
() negation? or matching selection for back reference?

Finally the expressin matches whole string and evaluates what is matched 
by parenthesised value. This evaluation is returned by backreference.

Is it correct evaluation?

Regards
Petr

> 
> On 14.11.2011 10:22, Petr PIKAL wrote:
> > Hi
> >
> >> On 11/14/2011 07:45 PM, Petr PIKAL wrote:
> >>> Dear all
> >>>
> >>> I am again (as usual) lost in regular expression use for selection.
> > Here
> >>> are my data:
> >>>
> >>>> dput(mena)
> >>> c("138516_10g_50ml_50c_250utes1_m53.00-_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m54.00_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m55.00_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m56.00_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m57.00_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m58.00_s1.imp",
> >>> "138516_10g_50ml_50c_250utes1_m59.00_s1.imp")
> >>>
> >>> I want to select only values "m" foolowed by numbers from 53 to 59.
> >>>
> >>> I used
> >>>
> >>> sub("m5.", "", mena)
> >>>
> >>> which correctly selects those m53 - m59 values but, in contrary to 
my
> >>> expectation, it replaced the selected values with specified
> > replacement -
> >>> in that case empty string.
> >>>
> >>> What I shall use if I want to get rid of all but m53-m59 from those
> >>> strings?
> >>>
> >> Hi Petr,
> >> How about:
> >>
> >> grep("m5",mena)
> >
> > It gives numeric values which tells me that there is a match in each
> > string, but as a result I need only
> >
> > m53-m59 substrings.
> 
> 
> gsub(".*_(m5.).*", "\\1", mena)
> 
> Uwe Ligges
> 
> 
> 
> > Regards
> > Petr
> >
> >
> >
> >>
> >> Jim
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list