[R] Reading in an XLS (really XML) file from website

John McKown john.archie.mckown at gmail.com
Fri Feb 27 23:05:49 CET 2015


On Fri, Feb 27, 2015 at 10:01 AM, Bos, Roger <roger.bos at rothschild.com>
wrote:

> All,
>
> I am trying to read the S&P 500 constituents from the iShares website
> using the following code:
>
>    URL <- "http://www.ishares.com/us/239726/fund-download.dl"
>    setInternet2(TRUE)
>    download.file(url=URL, destfile="temp.xls")
>    out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings",
> header=TRUE, startRow=13)
>
> R returns the following error:
>
> >    out <- readWorksheetFromFile(file="temp.xls", sheet="Holdings",
> header=TRUE, startRow=13)
> Error: IllegalArgumentException (Java): Your InputStream was neither an
> OLE2 stream, nor an OOXML stream
> In addition: Warning message:
> In download.file(url = URL, destfile = "temp.xls") :
>   downloaded length 1938303 != reported length 200
>
> Upon further examination this is because the format is really XML.  Is
> there any way to get XLConnect or any other excel reader to read in an XML
> file?  I thought XML was for new Excel format.
>
> Barring that, can we read in the file using the XML package? I tried the
> following code...
>
>    require(XML)
>    tmp <- xmlParse(URL)
>
> ... but I get this error:
>
> Opening and ending tag mismatch: Style line 14 and Style
> Error: 1: Opening and ending tag mismatch: Style line 14 and Style
>
> Thanks in advance for any help or hints,
>
> Roger
>
>
​The problem is indeed on line 14 of the file. The contents of that line
are:

</style>

but should be

</ss:style>

That is, the file is malformed. I edited the file to make that change and
saved it. After I did this, I was able to open it as a spreadsheet using
LibreOffice. I did all of this on my home Linux system. I don't have
Windows, and thus no Excel either, available here, so I can't test with
Excel. ​You should be able to download this file as shown by Raghuraman. On
Windows (which I _assume_ you are using since most do), you can edit the
file using Notepad, or Wordpad. I would use Wordpad myself. Notepad is
"iffy" on some things. Save it back, then try readWorksheetFromFile() as
you originally did.


-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]



More information about the R-help mailing list