[R] How to import specific column(s) using "read.table"?

Thomas Lumley tlumley at u.washington.edu
Mon Aug 9 22:52:01 CEST 2004


On Mon, 9 Aug 2004, F Duan wrote:

> Dear R people,
>
> I have a very big tab-delim txt file with header and I only want to import
> several columns into R. I checked the options for "read.table" and only
> found "nrows" which lets you specify the maximum number of rows to read in.
> Although I can use some text editors (e.g., wordpad) to edit the txt file first
> before running R, I feel it’s not very convenient. The reason for me to do this
> is that if I import the whole file into R, it will eat up too much of my
> system’s memory. Even after I remove it later, I still can’t release the memory.
>

You can't avoid reading the whole file, but you can avoid having it in
memory.

I'll assume you know how many lines are in the file, call it N. (this
isn't necessary  but it is tidier) and that you are interested in columns
10 and 110, both numeric

If you do something like

inputfile<-file("inputfile.txt",open="r")
result<-data.frame(col10=numeric(N), col110=numeric(N))
chunksize<-1000
nchunks<- ceiling(N/1000)

for(i in 1:nchunks){
	chunk<-read.table(inputfile,nrows=chunksize)
	result[ (i-1)*chunksize+ (1:chunksize),]<-chunk[,c(10,110)]
}

close(inputfile)

you can choose the chunk size so that the memory use is not too bad.

There are also more efficient ways that make you do more of the work (eg
read in lines of text with readLines and use regular expressions to
extract the columns you need)

	-thomas




More information about the R-help mailing list