[R] Data input performance

Agustin Lobo alobo at ija.csic.es
Thu Jan 24 18:26:21 CET 2002


Perhaps

mimatriz <- scan("directory_and_file_names"),byrow=T,ncol=2300)

is more efficient. Be sure you only have names
in the file, otherwise you need:

mimatriz <- scan("directory_and_file_names",what=""),byrow=T,ncol=2300)

and then mimatriz will be char. Then you subset the
numeric cols and/or rows and apply as.numeric.

Agus

Dr. Agustin Lobo
Instituto de Ciencias de la Tierra (CSIC)
Lluis Sole Sabaris s/n
08028 Barcelona SPAIN
tel 34 93409 5410
fax 34 93411 0012
alobo at ija.csic.es


On Thu, 24 Jan 2002, Filip Ginter wrote:

> Dear list,
> 
> I'm brand new to R (started using it few days ago...), so sorry for possibly 
> stupid question.
> 
> Anyways, I'm using R to cluster my data. I do have the dissimilarity matrix 
> as a text file, numbers separated by space. It's at its best something like 
> 2300x2300 matrix.
> 
> Now, it seems to me, that the process of importing the matrix into R is 
> rather slow. For the peak size of 2300x2300 it takes almost two hours. The 
> clustering itself takes a minimum of time when compared to importing the 
> data. I have 256MB memory, 900MHz processor PC, Linux (RH7.1). The version of 
> R is "Version 1.4.0  (2001-12-19)"
> 
> I have tried to follow all the recomendations I found in the documentation, 
> so I do something like this: (The file consists of 2300 rows, each containing 
> 2300 real numbers, separated by space. Nothing else.)
> 
> __________________________
> 
> library(cluster)
> CC<-c("numeric")
> T1<-read.table("matrix",nrows=2300,colClasses=CC)
> T2<-as.dist(T1)
> rm(T1)
> T3<-agnes(T2,diss=TRUE)
> write.table(T3$merge,file=outfile,quote=FALSE)
> 
> ___________________________
> 
> The CC vector contains the "numeric" only once, as I read that the values are 
> "recycled"...
> 
> So, is there any room for improvement? Any way to make the data import 
> quicker?
> 
> Thanks a lot.
> 
> Best regards,
> 
> Filip
> 
> -- 
> 
> -----------------------------------------------------------------
> Filip Ginter
> Ph.D. student
> 
> Email: ginter at cs.utu.fi
> Phone: +358-2-2154078
> Office: 4122, 4th floor
> ICQ: 146959496
> 
> Turku Centre for Computer Science
> Lemminkäisenkatu 14A
> 20520 Turku
> Finland
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list