[R] Slow reading multiple tick data files into list of dataframes

Gabor Grothendieck ggrothendieck at gmail.com
Mon Oct 11 23:48:35 CEST 2010


On Mon, Oct 11, 2010 at 5:39 PM, rivercode <aquanyc at gmail.com> wrote:
>
> Hi,
>
> I am trying to find the best way to read 85 tick data files of format:
>
>> head(nbbo)
> 1 bid  CON  09:30:00.722    09:30:00.722  32.71   98
> 2 ask  CON  09:30:00.782    09:30:00.810  33.14  300
> 3 ask  CON  09:30:00.809    09:30:00.810  33.14  414
> 4 bid  CON  09:30:00.783    09:30:00.810  33.06  200
>
> Each file has between 100,000 to 300,300 rows.
>
> Currently doing   nbbo.list<- lapply(filePath, read.csv)    to create list
> with 85 data.frame objects...but it is taking minutes to read in the data
> and afterwards I get the following message on the console when taking
> further actions (though it does then stop):
>
>    The R Engine is busy. Please wait, and try your command again later.
>
> filePath in the above example is a vector of filenames:
>> head(filePath)
> [1] "C:/work/A/A_2010-10-07_nbbo.csv"
> [2] "C:/work/AAPL/AAPL_2010-10-07_nbbo.csv"
> [3] "C:/work/ADBE/ADBE_2010-10-07_nbbo.csv"
> [4] "C:/work/ADI/ADI_2010-10-07_nbbo.csv"
>
> Is there a better/quicker or more R way of doing this ?
>

You could try (possibly with suitable additonal arguments):

library(sqldf)
lapply(filePath, read.csv.sql)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list