[R] Largest allowable matrix

Spencer Graves spencer.graves at pdf.com
Mon Nov 21 18:08:55 CET 2005


	  What do you want to do with these large matrices?  Both "scan" and 
"read.table" allow you to skip a certain number of lines at the 
beginning of a file and process however many lines you want from that 
point.

	  I recently had large files that were too big for S-Plus 6.  I moved 
to R, and processed them as submatrices without a problem.  I typically 
use "readLines" to check the format of the first few records and 
"count.fields" to determine if all records have the same numbers of 
fields.  In one case recently, I had a file that was almost but not 
quite regular.  I processed the file in pieces, carefully examining 
records right before and after each change in the number of records, and 
recovered basically everything without going back to my client (through 
several layers of bureaucracy) to ask for their help in parsing that file.

	  I frequently use a construct like the following:

File. <- ".....<filename>"
readLines(File., 9)
# to check the format including the "sep" character
quantile(nFlds <- count.fields(File., sep="\t")) #or sep="," for csv

# If the file honestly has a fixed number of fields,
# this will show that.
# If not, either the "sep" character is wrong or the file has problems.
# In either case, this helps me plan what to do next.

	  hope this helps.
	  spencer graves

Prof Brian Ripley wrote:

> On Mon, 21 Nov 2005, Uwe Ligges wrote:
> 
> 
>>Barry Baker wrote:
>>
>>
>>>Hello,
>>>
>>>I am a new R user and have two datasets that I would like to analyze.  The
>>>first is (2409222 x 17) and the other is (21682998 x 17). Is this possible
>>>in R?  If not then what is the maximum number of rows and columns or number
>>>of elements that R can handle?
>>
>>
>>The number of columns and rows is not a problem here, but you will need
>>21682998 * 17 * 4 bytes to store the latter matrix (assuming floats) in
>>memory, that is 1406.139 Mb.
> 
> 
> R does not use floats internally.  So unless these are integers/logicals 
> you are going to need twice that,
> 
> 
>>In order to do something sensible with the data, you need *at least*
>>twice the amount of RAM, hence at least 3Gb.
> 
> 
> Here I think the issue is rather virtual memory and address space.  You 
> will need a 64-bit OS to do anything with this object.
> 

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915




More information about the R-help mailing list