[R] Very slow read.table on Linux, compared to Win2000

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Jun 28 14:28:42 CEST 2006


<davidek at zla-ryba.cz> writes:

> Dear all,
> 
> I read.table a 17MB tabulator separated table with 483 variables(mostly numeric) and 15000 
> observations into R. This takes a few seconds with R 2.3.1 on windows 2000, but it takes 
> several minutes on my Linux machine. The linux machine is Ubuntu 6.06, 256 MR RAM,
> Athlon 1600 processor. The windows hardware is better (Pentium 4, 512 RAM), but it
> shouldn't make such a difference. 
> 
> The strange thing is that even doing something with the data(say a histogram of a variable, or
> transforming
> integers into a factor)  takes really long time on the linux box and the computer seems to work
> extensively with the hard disk. 
> Could this be caused by swapping ? Can I increase the memory allocated to R somehow ?
> I have checked the manual, but the memory options allowed for linux don't seem to
> help me (I may be doing it wrong, though ...)
> 
> The code I run:
> 
> TBO <- read.table(file="TBO.dat",sep="\t",header=TRUE,dec=",");   # this takes forever
> TBO$sexe<-factor(TBO$sexe,labels=c("man","vrouw"));   # even this takes like 30 seconds, compared
> to nothing on Win2000
> 
> I'd be grateful for any suggestions,

Almost surely, the fix is to insert more RAM chips. 256 MB leaves you
very little space for actual work these days, and a 17MB file will get
expanded to several times the original size during reading and data
manipulations. Using a lightweight window manager can help, but you
usually regret the switch for other reasons. 


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list