[R] file input with readLines

Uwe Ligges ligges at statistik.tu-dortmund.de
Tue Oct 4 17:41:47 CEST 2011



On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote:
> I am using readLines to read a fairly large ASCII file.  readLines reads
> a fixed number of lines, then other R code processes the data, then
> readLines reads the same number of lines again, then other R code
> processes the data, then ....
>
>
>
> Sort of like:
>
>
>
> conn<-file('filename','r')
>
> for (chunk in 1:100000) {
>
>     Lines<-readLines(conn,n=25)
>
>    # process "Lines"
>
> }
>
>
>
> The code is working, but I notice that it slows down greatly as time
> progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
> to read the next chunk, 10 after that.  The quasi-exponential trend has
> slowed, thank goodness, but after about a hundred reads, the read time
> for the next chunk is over a minute.  Let me stress that the number of
> lines read in each chunk of data is absolutely fixed.
>
>
>
> The only processing I am doing at the point is to parse the new data,
> and rbind the results to an existing data frame.

And that's may be the interesting point.
Have you tried to allocate the whole data.frame and assign into it 
later? It is probbaly not readLines() slowing you down.
A minute seems to be quite a lot for resonable sized data. How many 
columns are we talking about?.

Uwe Ligges




>  Processing of new data
> in no way depends on earlier data.
>
>
>
> So, my question is why is the reading taking longer as time goes on?  Is
> there a way to fix this?  Is there a better method than readLines?
>
>
>
> Thanks.
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list