[R] puzzle using gsub (and encodings maybe)

Duncan Murdoch murdoch at stats.uwo.ca
Wed Oct 14 20:29:13 CEST 2009


On 10/14/2009 2:16 PM, Adrian Dragulescu wrote:
> I get the same results (not working) using R 2.9.2 and R.10.0 beta.

But it is working:  the dash is an "ad" in x, not a "2d".  You need to 
ask to substitute for the "ad" character, e.g. by

spacelongdash <- rawToChar(as.raw(c(0x20, 0xad)))
gsub(spacelongdash, "-", x)

Duncan Murdoch

> 
> Thank you for looking at this.
> 
> On Wed, 14 Oct 2009, Duncan Murdoch wrote:
> 
>> On 10/14/2009 1:41 PM, Adrian Dragulescu wrote:
>>> 
>>>> charToRaw(x)
>>>   [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44
>>>> charToRaw(y)
>>>   [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44
>>>> 
>>> 
>>> So they are different.
>>> 
>>> Adrian
>>> 
>>> I use R 2.8.1 on WinXP
>>
>> But that's ancient.  Please try again with the beta of 2.10.0, and let us 
>> know if you still see a problem.
>>
>> Duncan Murdoch
>>
>>> 
>>> 
>>> On Wed, 14 Oct 2009, Duncan Murdoch wrote:
>>> 
>>>> On 10/14/2009 1:30 PM, Adrian Dragulescu wrote:
>>>>> Hello,
>>>>> 
>>>>> Below is some output that shows my issue.
>>>>> 
>>>>> I have a variable x that I read from a file (more on this below)
>>>>> 
>>>>>> x
>>>>> [1] "NEW YORK NEW ENGLAND"
>>>>>> gsub(" -", "-", x)            # this does not work!
>>>>> [1] "NEW YORK NEW ENGLAND"
>>>> 
>>>> It looks as though it worked, presumably because something got lost in 
>>>> your email.
>>>> 
>>>> Could you post charToRaw(x) so we can see what's in x?
>>>> 
>>>> Duncan Murdoch
>>>> 
>>>>>> Encoding(x)                   # is x in a special encoding? no
>>>>> [1] "unknown"
>>>>>> y = "NEW YORK -NEW ENGLAND"   # I type in variable y
>>>>>> gsub(" -", "-", y)            # and gsub works as expected
>>>>> [1] "NEW YORK-NEW ENGLAND"
>>>>>> 
>>>>> 
>>>>> I'm sure the problem has to do with the way I read the variable x.  But 
>>>>> even if I change the encoding for x to ASCII, I still cannot do the sub.
>>>>> I get x by reading a pdf file with pdftotext so you will not be able to 
>>>>> replicate my issue.
>>>>> 
>>>>> Thanks for any suggestions,
>>>>> Adrian
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>
>>




More information about the R-help mailing list