[R] Memory limits using read.table on Windows XP Pro

Latchezar Dimitrov ldimitro at wfubmc.edu
Fri Jun 24 20:09:10 CEST 2005


Thank you very much for your attention. I checked rw-FAQ, did not
mention it though. Since it's common req. I thought it is a common
practice too and decided not to abuse bandwidth. Apparently wrong.
However from what I presented you can easily (I guess) infer it as well.
Your guess is about what I used is absolutely correct as I expected BTW.
Or yeah, the water is wet although I did not mention it either :-)

R FAQ

Frequently Asked Questions on R
Version 2.1.2005-06-22
ISBN 3-900051-08-9:

"7.28 Why is read.table() so inefficient?

By default, read.table() needs to read in everything as character data,
and then try to figure out which variables to convert to numerics or
factors. For a large data set, this takes condiderable amounts of time
and memory. Performance can substantially be improved by using the
colClasses argument to specify the classes to be assumed for the columns
of the table."

(The vital word "condiderable" above is not explained anywhere, so I
guess it means considerable. I think you (all) need to check the
spelling of the words you (all) use. Although spelling-checkers are much
misused they are sometimes useful.)

Is my use of read.table() in accordance with the above? Can it be
improved with respect of my problem? 

R for Windows FAQ
Version for rw2011
B. D. Ripley and D. J. Murdoch:
(it does not say Prof. but I guess it is "Prof. B. D. Ripley", isn't
it?)

"2.11 There seems to be a limit on the memory it uses!

Indeed there is. It is set by the command-line flag --max-mem-size (see
How do I install R for Windows?) and defaults to the smaller of the
amount of physical RAM in the machine and 1Gb. It can be set to any
amount over 16M. (R will not run in less.) Be aware though that Windows
has (in most versions) a maximum amount of user virtual memory of 2Gb,
and parts of this can be reserved by processes but not used."

So what is wrong if at all in my configuration, settings, parameters,
flags, etc. (you name them) with respect of the above?

Although I did not mention it I know very well the diff. b/n GiB, GB,
and Gb (as used in rw-FAQ, wrongly I suppose) and your guess is
incorrect here. Anyway my estimates as you can see are conservative and
so your note does not contribute essential info.   

Despite your blunder about my knowledge I suspect that you secretly knew
about the conservativeness above so I wonder why after your correct
interpretation of my e-mail I did not get plain answer in straight
English.

Best regards,
Latchezar Dimitrov

PS. Please do not reply if you do not have any help or suggestions to
solve the problem (not about my education, experience, not mentioning
all the trivia, etc). Thanks

PPS. I also wonder if you have ever heard about "the magic word" or
there is no such thing as magic for Prof.'s

> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
> Sent: Friday, June 24, 2005 12:47 PM
> To: Latchezar Dimitrov
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Memory limits using read.table on Windows XP Pro
> 
> On Fri, 24 Jun 2005, Latchezar Dimitrov wrote:
> 
> > Hello,
> >
> > When I try:
> >
> > geno
> > 
> <-read.table("2500.geno.tab",header=TRUE,sep="\t",na.strings="
> .",quote="
> > ",comment.char="",colClasses=c("factor"),nrows=2501)
> >
> > I get, after hour(s) of work:
> >
> > Error: cannot allocate vector of size 9 Kb
> >
> > I have:
> >
> > Rgui.exe --max-mem-size=3Gb
> >
> > and
> >
> > multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP 
> > Professional" /fastdetect /NoExecute=OptIn /PAE /3GB
> >
> > in boot.ini
> >
> > 2500.geno.tab is a tab-delimited text table with 2500 x 125000 = 
> > 312,500,000 3-level (two alphabet characters) factors (x 4 bites = 
> > 1,250,000,000 (1.25GB). Even if we double it (as per 
> read.table help) 
> > it's still 2.5GB < 3Gb. And actually Windows Task Manager 
> shows peak 
> > mem use for Rgui 2,056,992K (~2.057GB) and total memory 
> used 2.62GB. 
> > And the total physical memory is 4GB (of which windows recognizes 
> > above 3GB)
> >
> > Any help or suggestions?
> 
> Do check the rw-FAQ.  If you modified R to address more than 
> 2GB, you omitted to tell us a vital fact, so I guess you did not.
> 
> I think you need to check the actual meaning of G and K, 
> although they are much misused. 1,250,000,000 is 1.16GB in 
> the units you are using for 3GB.
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>




More information about the R-help mailing list