[R] How to import BIG csv files with separate "map"?

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jul 14 21:48:56 CEST 2009


Either of the following can be done in one line of code:

Using the nrows and skip arguments to read.table one
can read in a subset of rows.   Using the colClasses argument
of read.table the class "NULL" will suppress reading in the
corresponding column.

read.csv.sql from the sqldf package will create a database on
the fly, read in the data, extract it to R according to whatever
SQL statement you give to its sql argument and then destroy
the database so that you have all the flexiblity of SQL in
selecting a portion of data. See http://sqldf.googlecode.com
and the example here:
http://code.google.com/p/sqldf/#Example_13._read.csv.sql

On Tue, Jul 14, 2009 at 1:53 PM, giusto<giusto at uoregon.edu> wrote:
>
> Hi all,
>
> I am having problems importing a VERY large dataset in R. I have looked into
> the package ff, and that seems to suit me, but also, from all the examples I
> have seen, it either requires a manual creation of the database, or it needs
> a read.table kind of step. Being a survey kind of data the file is big (like
> 20,000 times 50,000 for a total of about 1.2Gb in plain text) the memory I
> have isn't enough to do a read.table and my computer freezes every time :(
>
> This far I have managed to import the required subset of the data by using a
> "cheat": I used GRETL to read an equivalent Stata file (released by the same
> source that offered the csv file), manipulate it and export it in a format
> that R can read into memory. Easy! But I am wondering, how is it possible to
> do this in R entirely from scratch?
>
> Thanks
> --
> View this message in context: http://www.nabble.com/How-to-import-BIG-csv-files-with-separate-%22map%22--tp24484588p24484588.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list