[R] regexpr: R takes very long with non-existent pattern

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Thu May 19 02:26:17 CEST 2022


Hello,


I tried this myself, something like:


dat <- utils::read.csv(
    "https://raw.githubusercontent.com/discoleo/R/master/TextMining/Pubmed/Example_Abstracts_Title_Pubmed.csv",
    check.names = FALSE
)


regexpr(patt, dat$Abstract, perl = TRUE)
regexpr(patt, dat$Title, perl = TRUE)


and I can't reproduce your issue. Mine seems to raise the error within
a second or less that object 'patt' does not exist. I'm using R 4.2.0
and Windows 11, though that shouldn't be making a difference: if you
look at Sys.info(), it's still Windows 10 with a build version of
22000. Don't really know what else to say, have you tried it again
since?


Regards,
    Andrew Simmons

On Wed, May 18, 2022 at 5:09 PM Leonard Mada via R-help
<r-help using r-project.org> wrote:
>
> Dear R Users,
>
>
> I have run the following command in R:
>
> # x = larger vector of strings (1200 Pubmed abstracts);
> # patt = not defined;
> npos = regexpr(patt, x, perl=TRUE);
> # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found
>
>
> The problem:
>
> R becomes unresponsive and it takes 1-2 minutes to return the error. The
> operation completes almost instantaneously with a valid pattern.
>
> Is there a reason for this behavior?
>
> Tested with R 4.2.0 on MS Windows 10.
>
>
> I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone
> wants to check:
>
> - see file: Example_Abstracts_Title_Pubmed.csv;
>
> https://github.com/discoleo/R/tree/master/TextMining/Pubmed
>
> The variable patt was not defined due to an error: but it took very long
> to exit the operation and report the error.
>
>
> Many thanks,
>
>
> Leonard
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list