[R] & and |

Ivan Calandra c@|@ndr@ @end|ng |rom rgzm@de
Fri Aug 21 08:50:29 CEST 2020


Thank you Bert, this is wonderful!

Best wishes,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 21/08/2020 0:37, Bert Gunter wrote:
> The single grep regex solutions offered to Ivan's problem were fine,
> but do not readily generalize to the conjunction of multiple (>2, say)
> regex patterns that can appear anywhere in a string and in any order.
> However, note that this can easily be done using the Perl zero width
> lookahead construction,  "(?=...)" .
> e.g.
> > test <- test <- c("xyCz",
> "xAyCz","xAyBzC","xCByAz","xACyB","BAyyC","CBxBAy")
>
> ## to search for strings contain "A", "B", & "C" in any order
> > grep("(?=.*A)(?=.*B)(?=.*C)", test, perl = TRUE)
> [1] 3 4 5 6 7
>
> Note that this matches on one or multiple instances of the patterns.
> If one wants only exactly one instance of each conjunct,  then
> something like this should do:
>
> > lookfor <- c("A","B","C")
> > notme <- paste0("[^",lookfor,"]*")
> > z <- paste0("(?=", notme, lookfor, notme, "$)",collapse = "")
> > grep(z, test, perl = TRUE)
> [1] 3 4 5 6
>
> Cheers,
> Bert
>
>
>
>
> On Wed, Aug 19, 2020 at 11:38 PM Ivan Calandra <calandra using rgzm.de
> <mailto:calandra using rgzm.de>> wrote:
>
>     Thank you all for all the very helpful answers!
>
>     Best,
>     Ivan
>
>     --
>     Dr. Ivan Calandra
>     TraCEr, laboratory for Traceology and Controlled Experiments
>     MONREPOS Archaeological Research Centre and
>     Museum for Human Behavioural Evolution
>     Schloss Monrepos
>     56567 Neuwied, Germany
>     +49 (0) 2631 9772-243
>     https://www.researchgate.net/profile/Ivan_Calandra
>
>     On 20/08/2020 3:28, Richard O'Keefe wrote:
>     > There are & and | operators in the R language.
>     > There is an | operator in regular expressions.
>     > There is NOT any & operator in regular expressions.
>     > grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
>     > looks for elements of mydata containing the literal
>     > string 'ConfoMap&GuineaPigs'.
>     >
>     > > foo <- c("a","b","cab","back")
>     > > foo[grepl("a",foo) & grepl("b",foo)]
>     > [1] "cab"  "back"
>     >
>     > grepl returns a TRUE/FALSE vector.
>     >
>     > On Thu, 20 Aug 2020 at 02:53, Ivan Calandra <calandra using rgzm.de
>     <mailto:calandra using rgzm.de>
>     > <mailto:calandra using rgzm.de <mailto:calandra using rgzm.de>>> wrote:
>     >
>     >     Dear useRs,
>     >
>     >     I feel really stupid, but I cannot understand why "&"
>     doesn't work
>     >     as I
>     >     expect, while "|" does.
>     >
>     >     I have the following vector:
>     >     mydata <- c("SSFA-ConfoMap_GuineaPigs_NMPfilled.csv",
>     >     "SSFA-ConfoMap_Lithics_NMPfilled.csv", 
>     >     "SSFA-ConfoMap_Sheeps_NMPfilled.csv",
>     >     "SSFA-Toothfrax_GuineaPigs.xlsx",
>     >     "SSFA-Toothfrax_Lithics.xlsx", "SSFA-Toothfrax_Sheeps.xlsx")
>     >     and I want to find the values that include both "ConfoMap" and
>     >     "GuineaPigs".
>     >
>     >     If I do:
>     >     grep("ConfoMap&GuineaPigs", mydata, value=TRUE)
>     >     it returns an empty vector, character(0).
>     >
>     >     But if I do:
>     >     grep("ConfoMap|GuineaPigs", mydata, value=TRUE)
>     >     it returns all the elements that include either "ConfoMap" or
>     >     "GuineaPigs", as I would expect.
>     >
>     >     So what is wrong with my "&" construct? How can I return the
>     elements
>     >     that include both parts?
>     >
>     >     Thank you for your help!
>     >     Ivan
>     >
>     >     --
>     >     Dr. Ivan Calandra
>     >     TraCEr, laboratory for Traceology and Controlled Experiments
>     >     MONREPOS Archaeological Research Centre and
>     >     Museum for Human Behavioural Evolution
>     >     Schloss Monrepos
>     >     56567 Neuwied, Germany
>     >     +49 (0) 2631 9772-243
>     >     https://www.researchgate.net/profile/Ivan_Calandra
>     >
>     >     ______________________________________________
>     >     R-help using r-project.org <mailto:R-help using r-project.org>
>     <mailto:R-help using r-project.org <mailto:R-help using r-project.org>>
>     mailing list --
>     >     To UNSUBSCRIBE and more, see
>     >     https://stat.ethz.ch/mailman/listinfo/r-help
>     >     PLEASE do read the posting guide
>     >     http://www.R-project.org/posting-guide.html
>     >     and provide commented, minimal, self-contained, reproducible
>     code.
>     >
>
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list