[R] regex not working for some entries in for loop

Omar André Gonzáles Díaz oma.gonzales at gmail.com
Sun Nov 8 05:42:58 CET 2015


Thanks S. Ellison.

Finally, Ihad some time to test it. Thanks for your clarification.

Just one more question:

You say:

Your regexes are on multiple lines and include whitespace and linefeeds.
For example you are not testing for
" .*forum.*|.*buy.*"; you are testing for
" .*forum.*|
                      .*buy.*"


But, the ".*", as far as I understand, means: any character, 0 or more
times. So I should cover the blank and break lines. May you explain this
further, this is not making click on my head.




2015-10-26 7:29 GMT-05:00 S Ellison <S.Ellison at lgcgroup.com>:

>
>
> > From: Omar André Gonzáles Díaz
> > Subject: [R] regex not working for some entries in for loop
> >
> > I'm using some regex in a for loop to check for some values in column
> "source",
> > and put a result in column "fuente".
>
> Your regexes are on multiple lines and include whitespace and linefeeds.
> For example you are not testing for
> " .*forum.*|.*buy.*"; you are testing for
> " .*forum.*|
>                       .*buy.*"
> (which among other things includes a \n)
> Don’t do that. Keep it to one line with no white space.
> if you must have line breaks in the code, form the pattern using paste, as
> in
> pat1 <- paste(c("site.*", ".*event.*", ".*free.*", ".*theguardlan.*",
>         ".*guardlink.*", ".*torture.*", ".*forum.*", ".*buy.*",
>         ".*share.*", ".*buttons.*", ".*pyme\\.lavoztx\\.com\\.*",
>         ".*amezon.*", "computrabajo.com.pe", ".*porn.*", "quality"),
>         collapse="|")
>
> spam <- grepl(pat1, sf$source,ignore.case = T)
>
> Also, it's not immediately clear why you’re looping. grepl returns a
> vector of logicals; you have a vector of character strings. Consider
> replacing 'if' constructs with 'ifelse' - albeit a complicated ifelse() -
> and doing the whole thing without a loop.
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:17}}



More information about the R-help mailing list