[R] reading a text file, one line at a time

Data Analytics Corp. walt at dataanalyticscorp.com
Sun Aug 15 19:12:10 CEST 2010


Hi,

This seems like a good solution.  I was concerned about the time taken 
up reading one at a time. If a chuck can be read in each time, then that 
should be the way for me to handle the problem.

Thanks,

Walt

________________________

Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
________________________
(V) 609-936-8999
(F) 609-936-3733
walt at dataanalyticscorp.com
www.dataanalyticscorp.com

_____________________________________________________

On 8/15/2010 1:06 PM, jim holtman wrote:
> For efficiency of processing, look at reading in several
> hundred/thousand lines at a time.  One line read/write will probably
> spend most of the time in the system calls to do the I/O and will take
> a long time.  So do something like this:
>
> con<- file('yourInputFile', 'r')
> outfile<- file('yourOutputFile', 'w')
> while (length(input<- readLines(con, n=1000)>  0){
>      for (i in 1:length(input)){
>          ......your one line at a time processing
>      }
>      writeLines(output, con=outfile)
> }
>
> On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp.
> <walt at dataanalyticscorp.com>  wrote:
>    
>> Hi,
>>
>> I have an upcoming project that will involve a large text file.  I want to
>>
>>   1. read the file into R one line at a time
>>   2. do some string manipulations on the line
>>   3. write the line to another text file.
>>
>> I can handle the last two parts.  Scan and read.table seem to read the whole
>> file in at once.  Since this is a very large file (several hundred thousand
>> lines), this is not practical.  Hence the idea of reading one line at at
>> time.  The question is, can R read one line at a time?  If so, how?  Any
>> suggestions are appreciated.
>>
>> Thanks,
>>
>> Walt
>>
>> ________________________
>>
>> Walter R. Paczkowski, Ph.D.
>> Data Analytics Corp.
>> 44 Hamilton Lane
>> Plainsboro, NJ 08536
>> ________________________
>> (V) 609-936-8999
>> (F) 609-936-3733
>> walt at dataanalyticscorp.com
>> www.dataanalyticscorp.com
>>
>> _____________________________________________________
>>
>>
>> --
>> ________________________
>>
>> Walter R. Paczkowski, Ph.D.
>> Data Analytics Corp.
>> 44 Hamilton Lane
>> Plainsboro, NJ 08536
>> ________________________
>> (V) 609-936-8999
>> (F) 609-936-3733
>> walt at dataanalyticscorp.com
>> www.dataanalyticscorp.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>      
>
>
>    


-- 
________________________

Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
________________________
(V) 609-936-8999
(F) 609-936-3733
walt at dataanalyticscorp.com
www.dataanalyticscorp.com



More information about the R-help mailing list