[R] regex not working for some entries in for loop

S Ellison S.Ellison at LGCGroup.com
Mon Oct 26 13:29:56 CET 2015



> From: Omar André Gonzáles Díaz
> Subject: [R] regex not working for some entries in for loop
> 
> I'm using some regex in a for loop to check for some values in column "source",
> and put a result in column "fuente".

Your regexes are on multiple lines and include whitespace and linefeeds. For example you are not testing for 
" .*forum.*|.*buy.*"; you are testing for 
" .*forum.*|
                      .*buy.*"
(which among other things includes a \n)
Don’t do that. Keep it to one line with no white space.
if you must have line breaks in the code, form the pattern using paste, as in
pat1 <- paste(c("site.*", ".*event.*", ".*free.*", ".*theguardlan.*", 
	".*guardlink.*", ".*torture.*", ".*forum.*", ".*buy.*", 
	".*share.*", ".*buttons.*", ".*pyme\\.lavoztx\\.com\\.*", 
	".*amezon.*", "computrabajo.com.pe", ".*porn.*", "quality"),
	collapse="|")

spam <- grepl(pat1, sf$source,ignore.case = T)

Also, it's not immediately clear why you’re looping. grepl returns a vector of logicals; you have a vector of character strings. Consider replacing 'if' constructs with 'ifelse' - albeit a complicated ifelse() - and doing the whole thing without a loop.

S Ellison


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK


More information about the R-help mailing list