[R] Scanning only specific columns into R from a VERY large file

Sharpie chuck at sharpsteen.net
Sat Apr 17 00:21:05 CEST 2010



Josh B-3 wrote:
> 
> Hi,
> 
> I turn to you, the R Sages, once again for help. You've never let me down!
> 
> (1) Please make the following toy files:
> 
> x <- read.table(textConnection("var.1 var.2 var.3 var.1000
> indv.1 1 5 9 7
> indv.210000 2 9 3 8"), header = TRUE)
> 
> y <- read.table(textConnection("var.3 var.1000"), header = TRUE)
> 
> write.csv(x, file = "x.csv")
> write.csv(y, file = "y.csv")
> 
> (2) Pretend you are starting with the files "x.csv" and "y.csv." They come
> from another source -- an online database. Pretend that these files are
> much, much, much larger. Specifically: 
>     (a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. 
>     (b) "y.csv" contains just header titles. Pretend that there are 90
> header titles in "y.csv" in total. These header titles are a subset of the
> header titles in "x.csv."
> 
> (3) What I want to do is scan (or import, or whatever the appropriate word
> is) only a subset of the columns from "x.csv" into an R. Specifically, I
> only want to scan the columns of data from "x.csv" into R that are
> indicated in the file "y.csv." I still want to scan in all 210000 rows
> from "x.csv," but only for the aforementioned columns listed in "y.csv."
> 
> Can you guys recommend a strategy for me? I think I need to use the scan
> command, based on the hugeness of "x.csv," but I don't know what exactly
> to do. Specific code that gets the job done would be the most useful. 
> 
> Thank you very much in advance!
> Josh
> 

read.csv.sql() from the sqldf package looks like it may do what you want- it
allows you to filter what gets read in from a CSV file using SQL statements,
something like:

  SELECT list,of,column,names FROM file


Hope this helps!

-Charlie

-----
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: http://n4.nabble.com/Scanning-only-specific-columns-into-R-from-a-VERY-large-file-tp2013646p2013653.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list