[R] regex - optional part isn't considered in replacement with gsub

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sun Aug 27 18:54:30 CEST 2017


Clearly you are being too specific about the structure of the sku. In the absence of better information about the sku you need to focus on identifying the delimiters and position of the sku... one way might be:

ecommerce$sku  <- sub( "^(.*)[ \n]+([^ \n]+)$", "\\2", ecommerce$producto )

Please learn to post using plain text format, as HTML corrupts the latter on this mailing list. The option exists in your email client (including the GMail Web interface if that is what you use).
-- 
Sent from my phone. Please excuse my brevity.

On August 27, 2017 9:18:52 AM PDT, "Omar André Gonzáles Díaz" <oma.gonzales at gmail.com> wrote:
>Hello, I need some help with regex.
>
>I have this to sentences. I need to extract both "49MU6300" and
>"LE32S5970"
>and put them in a new colum "SKU".
>
>A) SMART TV UHD 49'' CURVO 49MU6300
>B) SMART TV HD 32'' LE32S5970
>
>DataFrame for testing:
>
>ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49''
>CURVO
>49MU6300",
>                             "SMART TV HD 32'' LE32S5970"))
>
>
>I'm using gsub like this:
>
>1.- This would capture A as intended but only "32S5970" from B (missing
>"LE").
>
>ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)",
>"\\2",
>ecommerce$producto)
>
>
>2.- This would capture "LE32S5970" but not "49MU6300".
>
>ecommerce$sku <-
>gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
>ecommerce$producto)
>
>
>3.- If I make the 2 first letter optional with:
>
>ecommerce$sku <-
>gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2",
>ecommerce$producto)
>
>
>"49MU6300" is capture, but again only "32S5970" from B (missing "LE").
>
>
>What should I do? How would you approche it?
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list