[R] textConnections so slow!

Henrik Bengtsson hb at maths.lth.se
Mon Nov 10 20:44:56 CET 2003


Hi. I haven't looked at the source code for textConnection(), but I am
confident that the authors have done a good job, which makes me believe
that you're running out of RAM-memory and that you're starting to swap.
>From ?textConnection:

 "An input text connection is opened and the character vector is
  copied at time the connection object is created, and `close'
  destroys the copy."

Thus, in your code

 lines <- readLines("myBigFile.txt")
 data <- scan(textConnection(lines), sep = "\t")

you use approx. 2*object.size(lines) bytes (ignoring object.size(data)).
Try

 lines <- readLines("myBigFile.txt")
 lines <- textConnection(lines)
 gc() # maybe it helps to call the garbage collector here?
 data <- scan(lines, sep = "\t")

which should use approx object.size(lines) bytes. So if you're swapping,
then scan()-ing from a (temporary) file may do better. 

Moreover and more of a general suggestion, when using scan() and
read.table() you can help R to save memory by specifying the 'what' and
'colClasses' arguments, respectively.

Could this be it?

Henrik Bengtsson


> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mathieu Drapeau
> Sent: den 11 november 2003 01:05
> To: r-help at stat.math.ethz.ch
> Subject: [R] textConnections so slow!
> 
> 
> Is it normal that it takes a very long time to generate a connection 
> object on a big character vector?
> 
> This takes a very long time to process:
> lines <- readLines ("myBigFile.txt")
> data <- scan(textConnection(lines), sep = "\t")
> 
> against this that is pretty short to process:
> data <- scan("myBigFile.txt", sep = "\t")
> 
> Anyone has any clues how to efficiently do that because I 
> need to use a 
> textConnection on a big vector?
> 
> Thank you,
> Mathieu
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailma> n/listinfo/r-help
> 
>




More information about the R-help mailing list