[R] efficiency

Tue Apr 30 00:59:32 CEST 2002

On Mon, 29 Apr 2002, jimi adams wrote:

> i have a set of  files that i am reading into R one at a time and applying
> to a function that i have written
> where each is a 'table' n (columns) x 10000 (rows)
> n varies across the files and most of the rows only have data in the first
> few columns
> currently i am reading them in with the command:
> read.table(file="2.75.0.997.1", header=FALSE, sep="", skip=13, fill=,
> row.names=1, nrows=10000)->list
>
> ***and it works fine
> however we are now working with a huge table.
> i was wondering if there is a more efficient way to read this in
>
> IDEALLY i would like to have it as a list where each element is a row from
> the input file, eliminating all of the NA's that the above approach results
> in , such that i would have a list with 10000 elements and each of variable
> length from 1:n
>

You could declare a list with 10000 elements as
  data<-vector("list",10000)
and then open a connection to the file and read one line at a time:
  a<-file("2.75.0.997.1")
  open(a)
  for(i in 1:10000) data[[i]]<-scan(a,nlines=1)

I don't know if that would be more efficient, but it would use less
memory.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._