[R] reading a text file, one line at a time

jim holtman jholtman at gmail.com
Sun Aug 15 21:47:46 CEST 2010


Read in as big a chunk as you can;  take a look at your memory usage
and make sure you environment does not have any unnecessary large
objects sitting around.

On Sun, Aug 15, 2010 at 10:12 AM, Data Analytics Corp.
<walt at dataanalyticscorp.com> wrote:
> Hi,
>
> This seems like a good solution.  I was concerned about the time taken up
> reading one at a time. If a chuck can be read in each time, then that should
> be the way for me to handle the problem.
>
> Thanks,
>
> Walt
>
> ________________________
>
> Walter R. Paczkowski, Ph.D.
> Data Analytics Corp.
> 44 Hamilton Lane
> Plainsboro, NJ 08536
> ________________________
> (V) 609-936-8999
> (F) 609-936-3733
> walt at dataanalyticscorp.com
> www.dataanalyticscorp.com
>
> _____________________________________________________
>
> On 8/15/2010 1:06 PM, jim holtman wrote:
>>
>> For efficiency of processing, look at reading in several
>> hundred/thousand lines at a time.  One line read/write will probably
>> spend most of the time in the system calls to do the I/O and will take
>> a long time.  So do something like this:
>>
>> con<- file('yourInputFile', 'r')
>> outfile<- file('yourOutputFile', 'w')
>> while (length(input<- readLines(con, n=1000)>  0){
>>     for (i in 1:length(input)){
>>         ......your one line at a time processing
>>     }
>>     writeLines(output, con=outfile)
>> }
>>
>> On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp.
>> <walt at dataanalyticscorp.com>  wrote:
>>
>>>
>>> Hi,
>>>
>>> I have an upcoming project that will involve a large text file.  I want
>>> to
>>>
>>>  1. read the file into R one line at a time
>>>  2. do some string manipulations on the line
>>>  3. write the line to another text file.
>>>
>>> I can handle the last two parts.  Scan and read.table seem to read the
>>> whole
>>> file in at once.  Since this is a very large file (several hundred
>>> thousand
>>> lines), this is not practical.  Hence the idea of reading one line at at
>>> time.  The question is, can R read one line at a time?  If so, how?  Any
>>> suggestions are appreciated.
>>>
>>> Thanks,
>>>
>>> Walt
>>>
>>> ________________________
>>>
>>> Walter R. Paczkowski, Ph.D.
>>> Data Analytics Corp.
>>> 44 Hamilton Lane
>>> Plainsboro, NJ 08536
>>> ________________________
>>> (V) 609-936-8999
>>> (F) 609-936-3733
>>> walt at dataanalyticscorp.com
>>> www.dataanalyticscorp.com
>>>
>>> _____________________________________________________
>>>
>>>
>>> --
>>> ________________________
>>>
>>> Walter R. Paczkowski, Ph.D.
>>> Data Analytics Corp.
>>> 44 Hamilton Lane
>>> Plainsboro, NJ 08536
>>> ________________________
>>> (V) 609-936-8999
>>> (F) 609-936-3733
>>> walt at dataanalyticscorp.com
>>> www.dataanalyticscorp.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>
>
> --
> ________________________
>
> Walter R. Paczkowski, Ph.D.
> Data Analytics Corp.
> 44 Hamilton Lane
> Plainsboro, NJ 08536
> ________________________
> (V) 609-936-8999
> (F) 609-936-3733
> walt at dataanalyticscorp.com
> www.dataanalyticscorp.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list