[R] String Matching

Berend Hasselman bhh at xs4all.nl
Mon Feb 1 09:00:09 CET 2016


> On 1 Feb 2016, at 08:03, PIKAL Petr <petr.pikal at precheza.cz> wrote:
> 
> Hi
> 
> Maybe I am completely wrong but do you really need regular expressions?
> 
> You say you want to compare first nine characters of id?
> 
>> substr(id, 1,9)==cusip
> [1] TRUE
>> 
> 
> or the last six?
> 
>> substr(id, nchar(id)-6, nchar(id))=="432.rds"
> [1] TRUE
>> 
> 
> Cheers
> Petr
> 
> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Glenn
>> Schultz
>> Sent: Friday, January 29, 2016 6:02 PM
>> To: R Help R
>> Subject: [R] String Matching
>> 
>> All,
>> 
>> I have a file named as so 313929BL4FNMA2432.rds  the user may pass
>> either the first 9 character or the last six characters.  I need to
>> match the remainder of the file name using either the first nine or
>> last six.  I have read the help files for Regular Expression as used in
>> R and I think what I want to use is glob2rx.
>> 
>> I have worked a minimal example to test my code:
>> 
>> id <- "313929BL4FNMA2432.rds"
>> cusip <- "313929BL4"
>> poolnumm <- "FNMA2432"
>> paste(cusip, ".*", ".rds")
>> glob2rx(paste(cusip, ".*", ".rds"), trim.head = TRUE, trim.tail = TRUE)
>> 
>> This returns false which leads me to believe that it is not working
>> glob2rx(paste(cusip, ".*", ".rds"), trim.head = TRUE, trim.tail = TRUE)
>> == id
>> 
>> I am going to use as follows in the function below - which returns the
>> error file not found
>> 
>> MBS_Test <- function(MBS.id = "character"){ MBS <-
>> glob2rx(paste(MBS.id, ".*", "//.rds", sep = ""), trim.tail = TRUE)
>> MBS.Conn <- gzfile(description = paste(system.file(package =
>> "BondLab"), "/BondData/", MBS, sep = ""), open = "rb") MBS <-
>> readRDS(MBS.Conn)
>> on.exit(close.connection(MBS.Conn))
>> return(MBS)
>> }
>> 


I don't think you are using (glob) wild characters correctly; where you write .* you likely need *?
In addition why not use paste0, which does not use <space> as separator,  instead of paste?
Finally your poolnumm variable consists of 8 characters and not 6.

If you change your minimal example to this:

paste0(cusip, "*", ".rds")
glob2rx(paste0(cusip, "*", ".rds"))
grepl(glob2rx(paste0(cusip, "*", ".rds")), id)
grepl(glob2rx(paste0("*", poolnumm, ".rds")), id)

you get TRUE twice.

But Petr's solution for the first 9 characters is much simpler.
And for matching the last 6 (8) you'll have to remove the extension first and then use substr (if I understand your problem correctly).

Berend



More information about the R-help mailing list