[Rd] Yahoo bug in tseries::get.hist.quote and its::priceIts

Dirk Eddelbuettel edd at debian.org
Sun Apr 25 00:32:53 CEST 2004


Both get.hist.quote, and its derivative priceIts, rely on download.file() to
fetch financial data series from Yahoo! in .csv format. They allow for nice
interactive demonstrations of what one can do with R.

Unfortunately, both are currently broken as Yahoo! decided to add a somewhat
useless html comment at the end of the csv 'stream', breaking the regular
format of n rows with k columns.  Here is an example for the S&P500 index
since the beginning of the month (to keep it compact):

Date,Open,High,Low,Close,Volume,Adj. Close*
23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60
22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93
21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09
20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15
19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82
16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61
15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84
14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17
13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44
12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20
9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32
8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32
7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53
6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16
5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57
2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81
1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17
<!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 -->

Is there an _elegant and portable_ way of reading this with the last line?
I needed this, and used the somewhat clunky 

    data <- read.csv(destfile)
    unlink(destfile)
    data <- data[-(nlines-1),]          # skip very last line with commment

which uses nlines, which had already been computed (as has a offset of one
because of the header line).

I'd be happy to send this as a patch to tseries and its, but I have the
feeling we could do better.  How?

Thanks,  Dirk

-- 
The relationship between the computed price and reality is as yet unknown.  
                                             -- From the pac(8) manual page



More information about the R-devel mailing list