[R] Stataread + R-Devel fails for me

Thomas Lumley thomas at biostat.washington.edu
Mon Nov 20 22:32:35 CET 2000

On Mon, 20 Nov 2000, Zsombor Cseres-Gergely wrote:

> On Sun, Nov 19, 2000 at 08:44:13AM -0800, Thomas Lumley wrote:
> > You need version 2.6 of stataread.
> That works (basically). But I gave up reading a 30M Stata file with
> 128M RAM + 128M Swap. It could not do it after an hour or so.

That's surprising.  I just tried a 32Mb file on two machines.  Using
R1.1.1 with memory limited to vsize=100M, nsize=1000k (which would fit
into 128M, I think), it took about 20secs or less. This was using a local 
drive on a Sun server.

Using Rpre1.2 (and a network drive) on a 2-year-old Linux box it took 
  > unix.time(a<-read.dta("~/tmp/bigstatafile.dta"))
  [1] 18.12 29.21 47.74  0.00  0.00
  > gc()
            used (Mb) gc trigger (Mb)
  Ncells  483066 12.9     597831 16.0
  Vcells 8321421 63.5    8832871 67.4

and doesn't seem to have used an unreasonably large amount of RAM.

If the dataset has a lot of strings I would expect it to be slower (though
not that much), but Stata advises against using strings anyway.  Bad file
buffering/caching might cause problems too (since I use byte-by-byte
reads), but I can't see it being that bad.

Could you send the results of -desc- on your Stata file?


Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list