[R] Seeking a more efficient way to read in a file

Charilaos Skiadas cskiadas at gmail.com
Thu Jan 3 02:42:21 CET 2008


On Jan 2, 2008, at 6:05 PM, Talbot Katz wrote:

> Hi.
>
> I have a matrix stored in a large, tab-delimited flat file.  The  
> first row contains column names.  Because the matrix is symmetric,  
> the file has lower triangular format, so the second row contains  
> one number, the third row two numbers, etc.  In general, row k+1  
> contains k numbers; the matrix has 3000 rows, so the file has 3001  
> rows.  The file has variable length records, so each row ends with  
> its last piece of data.  I read in the file and produced the full  
> symmetric matrix as follows:
>
>> mana01 <- scan( file = "C:/mat.dat", sep = "\t", nlines = 1, what  
>> = "character" )Read 3000 items> nco <- length( mana01 )> malt <-  
>> matrix(0, nrow = nco, ncol = nco )> colnames( malt ) <- mana01>  
>> rownames( malt ) <- mana01> for ( i in 1:3000 ) { malt[ i, (1:i) ]  
>> <- scan( file="C:/mat.dat", skip = i, n = i, quiet = TRUE ) }
>> mat <- malt + t( malt ) - diag( diag( malt ) )>
>
> The for loop took a couple of hours to complete.  I suspect there's  
> a much faster way to do this.  Any suggestions?  Thanks!

I saw Jim's reply just after having just written a solution, so here  
is my take on it. The key thing, as Jim mentioned, is to not use scan  
each time, but to read the whole thing in and then process it. I read  
the lines, used strsplit to get a list of each individual line, and  
then used sapply after extending each row by the right number of zeros.

Not sure which of the two is faster.

nms <- scan("~/Desktop/testing.txt", sep="\t", nlines=1,  
what=character(0))
x <- scan("~/Desktop/testing.txt", sep="\n", skip=1, what=character 
(0)) # read as a vector of lines
splt <- strsplit(x,"\t") # split at the tabs
nr <- length(nms)
splt <- sapply(splt, function(x) c(as.numeric(x), rep(0,nr-length 
(x)))) # extend each for by the right number of zeros.


Haris Skiadas
Department of Mathematics and Computer Science
Hanover College




More information about the R-help mailing list