[R] Split String in regex while Keeping Delimiter

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Thu Apr 13 01:12:28 CEST 2023


Sometimes you need to NOT use a regular expression and do things simpler. You have a fairly simple example that not only does not need great power but may be a pain to do using a very powerful technique, especially if you want to play with look-ahead and look behind.

Assuming you have a line with repeated runs on non-plussed text followed by one to three contiguous runs of a plus sign, can you write a short function that scans along till it finds a plus sign, then looks ahead till it sees a space or the end of the line. It then wraps up all the text till the last plus and adds a copy to a growing list or other structure. It then continues from the space(s) it ignores and repeats.

When done, you have what you want in the format you want.

A variant on this is to start from the end and scan backwards and stop at any plus sign. Keep what follows but strip any whitespace to the left of it. The result is the list in backwards order unless you used a stack to hold it.

There are quite a few variants that might apply and perhaps use of functions in modules. A dumb example might be to preprocess the string and replace all instances of 1 to 3 plus signs and an optional space  with itself and an added letter like a ":" and then a second pass using a regular expression becomes trivial as the colons disappear.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of David Winsemius
Sent: Wednesday, April 12, 2023 6:03 PM
To: Emily Bakker <emilybakker using outlook.com>
Cc: r-help using r-project.org
Subject: Re: [R] Split String in regex while Keeping Delimiter

I thought replacing the spaces following instances of +++,++,+,- with "\n" and then reading with scan should succeed. Like Ivan Krylov I was fairly sure that you meant the minus sign to be "-" rather than "–", but perhaps your were using MS Word as an editor which is inconsistent with effective use of R. If so, learn to use a proper programming editor, and in any case learn to post to rhelp in plain text.

-- 
David

scan(text=gsub("([-+]){1}\\s", "\\1\n", dat), what="", sep="\n")



> On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybakker using outlook.com> wrote:
> 
> Hello List,
>  
> I have a dataset consisting of strings that I want to split while saving the delimiter.
>  
> Some example data:
> “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
> “leucocyten – grampositieve coccen +”
>  
> I want to split the strings such that I get the following result:
> c(“leucocyten +”,  “gramnegatieve staven +++”,  “grampositieve staven ++”)
> c(“leucocyten –“, “grampositieve coccen +”)
>  
> I have tried strsplit with a regular expression with a positive lookahead, but I am not able to achieve the results that I want.
>  
> I have tried:
> as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>  
> Which results in:
> c(“leucocyten “, “+”,  “gramnegatieve staven “, “+”, “+”, “+”,  “grampositieve staven ++”)
> c(“leucocyten “, “–“, “grampositieve coccen +”)
>  
>  
> Is there a function or regular expression that will make this possible?
>  
> Kind regards,
> Emily 
>  
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list