[R] Dealing With Extremely Large Files

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 1 03:34:43 CEST 2008


There are no built in facilties for fixed column widths but its not hard to
parse out the fields yourself using the sqlite substr function.

I've added example 6f to the sqldf home page which illustrates this.

http://sqldf.googlecode.com

On Tue, Sep 30, 2008 at 5:18 PM, zerfetzen <zerfetzen at yahoo.com> wrote:
>
> Thank you Gabor, this is fantastic, easy to use and so powerful.  I was
> instantly able to many things with .csv files that are much too large for my
> PC's memory.  This is clearly my new favorite way to read in data, I love
> it!
>
> Is it possible to use sqldf with a fixed width format that requires a file
> layout?
>
> For example, let's say you have a .dat file called madeup.dat, without a
> header row.  The hypothetical file madeup.dat for discussion has 3 variables
> (state, zipcode, and score), is 10 characters wide, and has 20 rows (again,
> just a made-up file).
>
> Here is my fumbling attempt at code that will read in only state and score,
> and randomly select 10 obs:
>
> library(sqldf)
>
> # Source pulls in the development version of sqldf.
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
>
> #Open a connection to that file.
> MyConnection <- file("madeup.dat")
>
> # Read in only state and score variables, and randomly select only 10 rows.
> MyData <- sqldf("select state,score from MyConnection order by random(*)
> limit 10")
>
> # I think everything about this would work, except it should not currently
> know which
> # columns are to be brought in for the state variable (which would be 1-2),
> and that
> # the text columns for zipcode (3-7) should be ignored, and finally that
> score (text
> # columns 8-10) should be included again.  If I have overlooked this, I
> apologize.
> # Thank you.
> --
> View this message in context: http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19750580.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list