[R] problems reading XML type file from ishares website

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Jul 28 20:20:23 CEST 2016


Please keep the list included in the thread (e.g. reply-all?).

I looked at the file and agree that it looks like xml with a utf8 byte order mark and Unix line endings, which means it is not XLS and it is not XLSX (which is a zipped directory of xml files with DOS line endings). Excel complains but manages to open the file if it has the XLS extension,  but I am not aware that any of the usual R Excel packages will understand this file. 

The byte order mark can be addressed by opening the file with encoding="UTF-8-BOM", but as you mentioned originally the XML structure is still broken (c.f. the error message about the Style ending tag). Line 16 seems to use /Style rather than /ss:Style. Maybe

library(XML)
txt <- readLines( fname, encoding="UTF-8-BOM" )
txt <- sub( "</Style>", "</ss:Style>", txt )
fnamenobom  <- "nobom.xml"
xmlfile  <- xmlTreeParse( "nobom.xml" )

-- 
Sent from my phone. Please excuse my brevity.

On July 28, 2016 8:26:44 AM PDT, "Bos, Roger" <roger.bos at rothschild.com> wrote:
>Jeff,
>
>Thanks for your suggestions.  I mentioned XLS because that is the
>extension the ishares website provides.  I have tried many packages
>such as xml, xml2, XLConnect, and readxl.  I am not even sure what data
>format the file is, but I looks to me like XML and the extension is
>XLS.  If you have the names of specific packages you think I should
>try, that would be very helpful.
>
>Thanks,
>
>Roger
>
>
>
>
>
>***************************************************************
>This message and any attachments are for the intended recipient's use
>only.
>This message may contain confidential, proprietary or legally
>privileged
>information. No right to confidential or privileged treatment
>of this message is waived or lost by an error in transmission.
>If you have received this message in error, please immediately
>notify the sender by e-mail, delete the message, any attachments and
>all
>copies from your system and destroy any hard copies.  You must
>not, directly or indirectly, use, disclose, distribute,
>print or copy any part of this message or any attachments if you are
>not
>the intended recipient.
>
>
>-----Original Message-----
>From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us]
>Sent: Thursday, July 28, 2016 10:34 AM
>To: Bos, Roger; r-help at r-project.org
>Subject: Re: [R] problems reading XML type file from ishares website
>
>XLS has nothing to do with XML. The shift from XLS to XLSX/XLSM formats
>was where XML was introduced. You might occasionally find mislabelled
>files that seem to work anyway, but there is a significant difference
>inside true XLS files.
>
>Use a package designed to handle your data format. There are a few, and
>most seem to require external software support  (e.g. Perl or Java or
>Windows OS), so you have to decide what overhead support headaches you
>can tolerate.
>--
>Sent from my phone. Please excuse my brevity.
>
>On July 28, 2016 6:14:28 AM PDT, "Bos, Roger"
><roger.bos at rothschild.com> wrote:
>>The ishares website has the S&P 500 stocks you can download as a XLS
>>file, which opens fine in Excel, but I am not able to open it in R due
>>to what seems to be invalid XML formatting.   I tried using XLConnect
>>and XML as shown below.  Does anyone know a workaround or can point
>out
>>what I am doing wrong.  Here is my reproducible code:
>>
>>temp <- "https://www.ishares.com/us/239726/fund-download.dl"
>>fname <- "ivv.xls"
>>download.file(url = temp, destfile = fname)
>>readWorksheetFromFile(fname)
>>library(XML)
>>xmlfile <- xmlTreeParse(fname)
>>
>>09:06:17 > readWorksheetFromFile(fname)
>>Error: InvalidFormatException (Java): Your InputStream was neither an
>>OLE2 stream, nor an OOXML stream
>>09:06:17 > library(XML)
>>09:06:25 > xmlfile <- xmlTreeParse(fname) Opening and ending tag
>>mismatch: Style line 14 and Style
>>Error: 1: Opening and ending tag mismatch: Style line 14 and Style
>>
>>
>>Thanks in advance, Roger
>>
>>
>>
>>
>>
>>
>>
>>This message and any attachments are for the intended recipient's use
>>only.
>>
>>This message may contain confidential, proprietary or legally
>>privileged
>>
>>information. No right to confidential or privileged treatment
>>
>>of this message is waived or lost by an error in transmission.
>>
>>If you have received this message in error, please immediately
>>
>>notify the sender by e-mail, delete the message, any attachments and
>>all
>>
>>copies from your system and destroy any hard copies.  You must
>>
>>not, directly or indirectly, use, disclose, distribute,
>>
>>print or copy any part of this message or any attachments if you are
>>not
>>
>>the intended recipient.
>>
>>       [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list