[R] Manage huge database

Gabor Grothendieck ggrothendieck at gmail.com
Mon Sep 22 18:52:23 CEST 2008


Try this:

read.table(pipe("/Rtools/bin/gawk -f cut.awk bigdata.dat"))

where cut.awk contains the single line (assuming you
want fields 101 through 110 and none other):

{ for(i = 101; i <= 110; i++) printf("%s ", $i); printf "\n" }

or just use cut.  I tried the gawk command above on Windows
Vista with an artificial file of 500,000 columns and 2 rows and it seemed
instantaneous.

On Windows the above uses gawk from Rtools available at:
   http://www.murdoch-sutherland.com/Rtools/
or you can separately install gawk.  Rtools also has cut if you
prefer that.

On Mon, Sep 22, 2008 at 2:50 AM, José E. Lozano <lozalojo at jcyl.es> wrote:
> Hello,
>
>
>
> Recently I have been trying to open a huge database with no success.
>
>
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
>
>
>
> I have try with The SAS System, but it reads only around 5000 columns, no
> more. R hangs up when opening.
>
>
>
> Is there any way to work with "parts" (a set of columns) of this database,
> since its impossible to manage it all at once?
>
>
>
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
>
>
>
> I've been searching the net, but found little about this topic.
>
>
>
> Best regards,
>
> Jose Lozano
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list