[R] R Help Question

jim holtman jholtman at gmail.com
Thu Jul 9 02:09:36 CEST 2009


How big is the entire text file?  What is the length of an average
line?  Have you tried to use 'scan' to read in the data?  How much
memory do you have?  Are you paging?  Here was a quick test I did with
a file with about 5M lines (12|this is some text|12345|more test):

> system.time(x <- scan('/tempxx.txt', what=list(0,'',0,''), sep="|"))
Read 4460544 records
   user  system elapsed
  50.70    1.04   54.16
>
> str(x)
List of 4
 $ : num [1:4460544] 12 12 12 12 12 12 12 12 12 12 ...
 $ : chr [1:4460544] "this is some text" "this is some text" "this is
some text" "this is some text" ...
 $ : num [1:4460544] 12345 12345 12345 12345 12345 ...
 $ : chr [1:4460544] "more test" "more test" "more test" "more test" ...
> object.size(x)
107053288 bytes

So some more details might help to evaluate what your problem is.

Took less than a minute to read it in.

On Wed, Jul 8, 2009 at 4:00 PM, Amy Wesolowski<amywesolowski at gmail.com> wrote:
> Hi,
>
> I am currently working on reading large files into R.  My files are text
> documents with four columns and around 10 million lines.
> Each line is set up as:
> string|integer|string|integer
>
> I have been trying to use read.table to read in the file, but I think I am
> reading too much into memory and the application quits.
>
> I want to be able to analyze the entire text document at once.
> I have thought about reading in the file, line by line, but I still want to
> store all the information together.  I have also thought about writing each
> line of the file to a matrix, but I cannot seem to figure it out.
>
> Any help would be great.
> Thanks,
> Amy
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list