[R] End of line marker?

David Winsemius dwinsemius at comcast.net
Fri Mar 5 05:40:08 CET 2010


On Mar 4, 2010, at 10:58 PM, Duncan Murdoch wrote:

> On 04/03/2010 10:32 PM, David Winsemius wrote:
>> On Mar 4, 2010, at 9:47 PM, jonas garcia wrote:
>>> When I opened the file with a hex-editor, the problematic  
>>> character  turned out to be “1a”
>>> I am attaching a sample DAT file with 3 lines (the second line is   
>>> the one with the undesirable character).
>>>
>>> The furthest I could get was through readBin:
>>>
>>>> tmp<- readBin("new.dat", what = "raw", n=100000000)
>>>  [1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31  
>>> 31  34 2c 2d 32 30 2e 38 33 36 2c 31
>>> [33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a  
>>> 33  35 3a 33 35 2c 20 34 34 33 32 2c
>>> [65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e  
>>> 38  2c 1a 30 2e 38 31 2c 31 31 35 37
>>> [97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37  
>>> 2e  31 31 34 2c 2d 32 30 2e 38 33 36
>>> [129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38
>>>
>>>
>>>> tmp[87]
>>> [1] 1a
>> I got a different "interpretation" of that character when I let R  
>> look  at it. And I cannot figure out why \032 should be causing  
>> problems??? :
>
> Hex 1a and octal 032 both correspond to Ctrl-Z, which is the MSDOS  
> EOF marker.  I forget whether R's text reading routines pay  
> attention to that, or whether it's the C runtime, but it makes sense  
> that it would cause problems on Windows.
>
> Duncan Murdoch

Thanks. I was interpreting \032 as decimal, so couldn't figure out why  
it should equal 0x1A. You've explained the basis (or base) of my  
confusion.

-- 
David
>
>> > tmporg <- readLines(con="/Users/davidwinsemius/Library/Mail   
>> Downloads/new.dat")
>> Warning message:
>> In readLines(con = "/Users/davidwinsemius/Library/Mail Downloads/  
>> new.dat") :
>>   incomplete final line found on '/Users/davidwinsemius/Library/ 
>> Mail  Downloads/new.dat'
>> > tmporg
>> [1] "02:35:32, 4403, 37.114,-20.836,155.9,00.76,1156"
>> [2] "02:35:35, 4432, 37.114,-20.836,155.8,\0320.81,1157"
>> [3] "02:35:39, 4467, 37.114,-20.836,155.8,00.81,1158"
>> > gsub("\\\032", ' ', tmporg)
>> [1] "02:35:32, 4403, 37.114,-20.836,155.9,00.76,1156" "02:35:35,  
>> 4432,  37.114,-20.836,155.8, 0.81,1157"
>> [3] "02:35:39, 4467, 37.114,-20.836,155.8,00.81,1158"
>> > read.table(textConnection(gsub("\\\032", ' ', tmporg) ) ,sep=",")
>>         V1   V2     V3      V4    V5   V6   V7
>> 1 02:35:32 4403 37.114 -20.836 155.9 0.76 1156
>> 2 02:35:35 4432 37.114 -20.836 155.8 0.81 1157
>> 3 02:35:39 4467 37.114 -20.836 155.8 0.81 1158
>> Looks like gsub might work well .... as long as you can get  
>> agreement  on what the character really is.
>> > sessionInfo()
>> R version 2.10.1 RC (2009-12-09 r50695)
>> x86_64-apple-darwin9.8.0
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets    
>> methods   base
>> other attached packages:
>> [1] Design_2.3-0    Hmisc_3.7-0     survival_2.35-7
>> loaded via a namespace (and not attached):
>> [1] cluster_1.12.1  grid_2.10.1     lattice_0.17-26 tools_2.10.1
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list