[R] Reading in a transcript-like file

David Winsemius dwinsemius at comcast.net
Wed Jun 30 15:54:04 CEST 2010


On Jun 30, 2010, at 12:21 AM, ARRRRRR wrote:

>
> http://r.789695.n4.nabble.com/file/n2272669/FT20100626_%2420_%2B_%242_Sit_%26_Go_-_%28169112900%29_-_Summary.txt
> FT20100626_%2420_%2B_%242_Sit_%26_Go_-_%28169112900%29_-_Summary.txt
>
> I have a lot of experience with Stata, but I'm new to R.  I'm trying  
> to read
> the attached file into R on my mac.  My goal is to have it as a  
> list, with
> each element a string - from then I can parse out the data I need  
> and add it
> as an observation in a dataframe.
>
> I've tried scan, readlines, etc. but I'm stumped.  I've been adding
> encoding="UTF-16", but that doesn't seem to help much.
> The closest I've come is:
>
> test<-scan(file="FT20100626 $20 + $2 Sit & Go - (169112900) -  
> Summary.txt",
> what=list(""), flush=FALSE, skip=0, encoding="UTF-16", quote="\n")
>
> which gives me a list wherein each element is first letter of the row.
>
>> test
> [[1]]
> [1] "\xff\xfeF" "T"         "P"         "T"         "S"         "$"
> "+"         "$"         "S"

I believe you are being bitten by an encoding issue and that it is  
referred to by this section of the help page from ?connections:

"The encoding "UCS-2LE" is treated specially, as it is the appropriate  
value for Windows ‘Unicode’ text files. If the first two bytes are the  
Byte Order Mark 0xFFFE then these are removed as most implementations  
of iconv do not accept BOMs. Note that some implementations will  
handle BOMs using encoding "UCS-2" but many will not."

Notice the your first two entries are \xff\xfe which I believe is a  
representation of 0xFFFE. When you look at that page with FireFox and  
request encoding information you are given UTF-16. I am not  
sufficiently educated on encoding issues even though we share  
platforms. I tried a few different encoding specifications including  
"UTF-16", "UCS-2" and "UCS-2LE" with scan and readLines but failed to  
work through to the solution. Another possiblity might be to subscribe  
to the R SIG-Mac mailing list and post the question there.

-- 
David.

> "[10] "&"         "G"         "("         "H"         "N"         "L"
> "B"         "u"         "$"
> [19] "+"         "$"         "B"         "u"         "C"         "1"
> "6"         "E"         "T"
> [28] "o"         "P"         "P"         "$"         "T"         "o"
> "s"         "2"         "0"
> [37] "E"         "T"         "o"         "f"         "2"         "1"
> "E"         "\n"        "1"
> [46] "B"         "$"         "2"         ":"         "J"         "$"
> "3"         ":"         "b"
> [55] "4"         ":"         "s"         "c"         "2"         "5"
> ":"         "R"         "6"
> [64] ":"         "S"         "B"         "o"         "f"         "i"
> "1"         "p"
>
> Any help would be greatly appreciated.
>
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Reading-in-a-transcript-like-file-tp2272669p2272669.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list