[R] read file part way through based on start and end date (first column)

jim holtman jholtman at gmail.com
Mon Mar 21 01:45:31 CET 2011


Depends on what version of R you are using.  If you are running a 32
bit version and if all the columns were numeric, if you had about 20
columns, I would guess that might require 300MB for a single copy of
the object and for the reading in and then subsetting, you might
require 3-4X that space.  So if you had 3GB of memory, you might be
fine.

How much would you expect to read from each file (1%, 10% or 100%)?
You might be better to initially put the data into a database and then
extract what you want from there.  Is it a fixed range that you want
to extract from all the files, or does it vary for each run?  There
are a number of RDMs that interface to R that would make the job
easier.

What you should try is to read in progressively larger sections of one
of the files to see how much memory is used.  If you are using
read.table, remember to explicity state what the mode of each column
is.  This will give you the best estimate as to if your system is
capable of handling a single file at a time.  This will also give you
the timing of how long it will take to read/convert the data.  I would
suggest that if your system can handle a single file, then you setup a
script to read in each of the files and "save" the resulting object.
This will allow a lot faster access on subsequent reads since the data
will already be converted.

On Sun, Mar 20, 2011 at 5:12 PM, algotr8der <algotr8der at gmail.com> wrote:
> Thanks Jim for the reply. The file has 1,183,318 rows and there are 20 such
> files.
>
> Too big for R to handle?
>
> --
> View this message in context: http://r.789695.n4.nabble.com/read-file-part-way-through-based-on-start-and-end-date-first-column-tp3391769p3392005.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list