[R] Scanning only specific columns into R from a VERY large file

Rubén Roa rroa at azti.es
Mon Apr 19 08:39:44 CEST 2010


-----Mensaje original-----
De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] En nombre de Josh B
Enviado el: sábado, 17 de abril de 2010 0:12
Para: R Help
Asunto: [R] Scanning only specific columns into R from a VERY large file

Hi,

I turn to you, the R Sages, once again for help. You've never let me down!

(1) Please make the following toy files:

x <- read.table(textConnection("var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.210000 2 9 3 8"), header = TRUE)

y <- read.table(textConnection("var.3 var.1000"), header = TRUE)

write.csv(x, file = "x.csv")
write.csv(y, file = "y.csv")

(2) Pretend you are starting with the files "x.csv" and "y.csv." They come from another source -- an online database. Pretend that these files are much, much, much larger. Specifically: 
    (a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. 
    (b) "y.csv" contains just header titles. Pretend that there are 90 header titles in "y.csv" in total. These header titles are a subset of the header titles in "x.csv."

(3) What I want to do is scan (or import, or whatever the appropriate word is) only a subset of the columns from "x.csv" into an R. Specifically, I only want to scan the columns of data from "x.csv" into R that are indicated in the file "y.csv." I still want to scan in all 210000 rows from "x.csv," but only for the aforementioned columns listed in "y.csv."

Can you guys recommend a strategy for me? I think I need to use the scan command, based on the hugeness of "x.csv," but I don't know what exactly to do. Specific code that gets the job done would be the most useful. 

Thank you very much in advance!
Josh

-----------
Try with something like

do.call("cbind",scan(file="yourfile.csv",what=list(NULL,NULL,....,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE)) 

you have to work out how to set up the list of parameter 'what' to read the headers of 'y'. In the above the only columns read are those indicated by a '0'.

HTH

Ruben 


____________________________________________________________________________________ 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



More information about the R-help mailing list