[R] End of line marker?

jim holtman jholtman at gmail.com
Fri Mar 5 04:24:47 CET 2010


This should work for you:


input <- file('/recv/new.dat', 'rb')
output <- file('/recv/newV2.dat', 'wb')
repeat {
    x <- readBin(input, what='raw', n=10000)
    if (length(x) == 0) break
    x[which(x == as.raw(0x1a))] <- charToRaw(' ')
    writeBin(x, output)
}
close(input)
close(output)


On Thu, Mar 4, 2010 at 9:47 PM, jonas garcia
<garcia.jonas80 at googlemail.com> wrote:
> When I opened the file with a hex-editor, the problematic character turned
> out to be “1a”
>
>  I am attaching a sample DAT file with 3 lines (the second line is the one
> with the undesirable character).
>
>
>
> The furthest I could get was through readBin:
>
>
>
>> tmp<- readBin("new.dat", what = "raw", n=100000000)
>
>   [1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31 31 34 2c
> 2d 32 30 2e 38 33 36 2c 31
>
>  [33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a 33 35 3a
> 33 35 2c 20 34 34 33 32 2c
>
>  [65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e 38 2c 1a
> 30 2e 38 31 2c 31 31 35 37
>
>  [97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37 2e 31 31
> 34 2c 2d 32 30 2e 38 33 36
>
> [129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38
>
>
>
>
>
>> tmp[87]
>
> [1] 1a
>
>
>
> The idea now is as Jim suggested, replace “1a” by (for example) “20” in the
> raw format and write the file back with
>
> writeBin(tmp, "new2.dat")
>
>
>
> Can I use gsub? How can I perform this operation without messing around with
> the raw format?
>
>
>
> Thanks
>
> J
>
>
>
>
> On Thu, Mar 4, 2010 at 8:35 PM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Have you considered reading the file in a binary/raw, finding the
>> offending character and replacing it with a blank (or whatever and
>> then writing the file back out).  You can then probably process it
>> using read.table.;
>>
>> On Thu, Mar 4, 2010 at 12:50 PM, jonas garcia
>> <garcia.jonas80 at googlemail.com> wrote:
>> > Thank you so much for your reply.
>> >
>> >
>> >
>> > I can identify the characters very easily in a couple of files. The
>> > reason I
>> > am worried is that I have thousands of files to read in. The files were
>> > produced in a very old MS-DOS software that records information on
>> > oceanographic data and geographic position during a survey.
>> >
>> >
>> >
>> > My main goal is read all these files into R for further analysis. Most
>> > of
>> > the files are cleared of these EOL markers but some are not. I only
>> > noticed
>> > the problem by chance when I was looking and comparing one of them. I
>> > wonder
>> > if I can solve this problem using R, without having to go for text
>> > editors
>> > separately.
>> >
>> >
>> >
>> > Help on this would be much appreciated.
>> >
>> > Thanks again
>> >
>> >
>> >
>> > J
>> >
>> >
>> > On 3/4/10, David Winsemius <dwinsemius at comcast.net> wrote:
>> >>
>> >>
>> >> On Mar 3, 2010, at 2:22 PM, jonas garcia wrote:
>> >>
>> >> Dear R users,
>> >>>
>> >>> I am trying to read a huge file in R. For some reason, only a part of
>> >>> the
>> >>> file is read. When I further investigated, I found that in one of my
>> >>> non-numeric columns, there is one odd character responsible for this,
>> >>> which
>> >>> I reproduce bellow:
>> >>> In case you cannot see it, it looks like a right arrow, but it is not
>> >>> the
>> >>> one you get from microsoft word in menu "insert symbol".
>> >>>
>> >>> I think my dat file is broken and that funny character is an EOL
>> >>> marker
>> >>> that
>> >>> makes R not read the rest of the file. I am sure the character is
>> >>> there by
>> >>> chance but I fear that it might be present in some other big files I
>> >>> have
>> >>> to
>> >>> work with as well. So, is there any clever way to remove this
>> >>> inconvenient
>> >>> character in R avoiding having to edit the file in notepad and remove
>> >>> it
>> >>> manually?
>> >>>
>> >>> Code I am using:
>> >>>
>> >>> read.csv("new3.dat", header=F)
>> >>>
>> >>> Warning message:
>> >>> In read.table(file = file, header = header, sep = sep, quote = quote,
>> >>>  :
>> >>>  incomplete final line found by readTableHeader on 'new3.dat'
>> >>>
>> >>
>> >> I think you should identify the offending line by using the
>> >> count.fields
>> >> function and fix it with an editor.
>> >>
>> >>
>> >> --
>> >> David
>> >>
>> >>>
>> >>> I am working with R 2.10.1 in windows XP.
>> >>>
>> >>> Thanks in advance
>> >>>
>> >>> Jonas
>> >>>
>> >>>        [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>>
>> >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >> David Winsemius, MD
>> >> Heritage Laboratories
>> >> West Hartford, CT
>> >>
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list