[R] Regexp pattern but fixed replacement?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Thu Apr 11 19:08:41 CEST 2024


On 11/04/2024 12:57 p.m., Dave Dixon wrote:
> Backslashes in regex expressions in R are maddening, but they make sense.
> 
> R string handling interprets your replacement string "\\" as just one
> backslash. Your string is received by gsub as "\" - that is, just the
> control backslash, NOT the character backslash. gsub is expecting to see
> \0, \1, \2, or some other control starting with backslash.
> 
> If you want gsub to replace with a backslash character, you have to send
> it as "\\". In order to get two backslash characters in an R string, you
> have to double them ALL: "\\\\".

You can use "\\" if the pattern is declared as "fixed", via

   sub("a", "\\", "abcdef", fixed = TRUE)

or

   stringr::str_replace("abcdef", fixed("a"), "\\")

My first question was whether there is a sub-like function with a way to 
declare the pattern as a regexp, but the replacement as fixed.  Thanks 
for your answer to my second question.

Duncan Murdoch

> 
> The string that is output is an R string: the backslashes are escaped
> with a backslash, so "\\\\" really means two backslashes.
> 
> There are lots of special characters in the search string, but only one
> in the replacement string: backslash.
> 
> Here's my favorite resource on this topic is
> https://www.regular-expressions.info/replacecharacters.html
> 
> 
> On 4/11/24 10:35, Duncan Murdoch wrote:
>> I noticed this issue in stringr::str_replace, but it also affects
>> sub() in base R.
>>
>> If the pattern in a call to one of these needs to be a regular
>> expression, then backslashes in the replacement text are treated
>> specially.
>>
>> For example,
>>
>>    gsub("a|b", "\\", "abcdef")
>>
>> gives "def", not "\\\\def" as I wanted.  To get the latter, I need to
>> escape the replacement backslashes, e.g.
>>
>>    gsub("a|b", "\\\\", "abcdef")
>>
>> which gives "\\\\cdef".
>>
>> I have two questions:
>>
>> 1.  Is there a variant on sub or str_replace which allows the pattern
>> to be declared as a regular expression, but the replacement to be
>> declared as fixed?
>>
>> 2.  To get what I want, I can double the backslashes in the
>> replacement text.  This would do that:
>>
>>     replacement <- gsub("\\\\", "\\\\\\\\", replacement)
>>
>> Are there any other special characters to worry about besides
>> backslashes?
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list