[R] regex for "[2440810] / www.tinyurl.com/hgaco4fha3"

Ulrik Stervbo ulrik.stervbo at gmail.com
Wed Feb 21 07:15:27 CET 2018


Hi Omar,

you are almost there.... but! Your first substitution looks 'www' as the
start of the line followed by anything (which then do nothing), so your
second substitution removes everything from the first '.' to be found
(which is the one after www).

What you want to do is
x <- "[2440810] / www.tinyurl.com/hgaco4fha3"

y <- sub('www\\.', '', x) # Note the escape of '.'
y <- sub('\\..*', '', y)
y

Altrenatively, all in one (if all addresses are .com)
gsub("(www\\.|\\.com.*)", "", x)

And the same using stringr
library(stringr)
x %>% str_replace_all("(www\\.|\\.com.*)", "")

HTH
Ulrik


On Wed, 21 Feb 2018 at 06:20 Omar André Gonzáles Díaz <
oma.gonzales at gmail.com> wrote:

> Hi, I need help for cleaning this:
>
> "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> My desired output is:
>
> "[2440810] / tinyurl".
>
> My attemps:
>
> stringa <- "[2440810] / www.tinyurl.com/hgaco4fha3"
>
> b <- sub('^www.', '', stringa) #wanted  to get rid of "www." part. Until
> first dot.
>
> b <- sub('[.].*', '', b) #clean from ".com" until the end.
>
> b #returns ""[2440810] / www"
>
> Thank you.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list