[R] List of lists? Data frames? (Or other data structures?)

R A F raf1729 at hotmail.com
Thu May 1 13:49:52 CEST 2003


Thanks for your comments.  I'm not too familiar with these differences,
but here's a simple experiment.  In a data file with 139,000 rows and
5 columns (double string double double double),

>system.time( aaa <- read.table( "file" ) )
20.67 0.41 21.10 0.00 0.00

>system.time( aaa <- scan( "file", list( 0, "", 0, 0, 0 ) ) )
6.07 0.01 6.09 0.00 0.00

It seems like scan is much faster -- and as the data file grows,
read.table seems to choke.  (I actually tried this with a data file
with over 2 million rows.)

I'm using a Sun-Sparc, Solaris 2.8 and R 1.5.1.  Sorry I can't be
more specific about the hardware/software configurations, not being
too knowledgeable about this sort of thing.

By the way, it's not possible to create a matrix of mixed types, is
it?  (I don't know how anyway.)

Any ideas as to the speed differences?  Thanks again.

>From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>To: Roger Peng <rpeng at stat.ucla.edu>
>CC: r-help at stat.math.ethz.ch, R A F <raf1729 at hotmail.com>
>Subject: Re: [R] List of lists?  Data frames? (Or other data structures?)
>Date: Thu, 1 May 2003 08:42:55 +0100 (BST)
>
>On Wed, 30 Apr 2003, Roger Peng wrote:
>
> > If you're talking about rows and columns, it seems like the appropriate
> > data structure for you is the data frame.  I think your list of lists
> > representation might get unwieldy after a while.  I can't really think 
>of
> > why a data frame would be any slower than a list of lists -- I've never
> > experienced such behavior.
> >
> > read.table() may be a little slower than scan() because read.table() 
>reads
> > in an entire file and then converts each of the columns into an
> > appropriate data class.  So there is some post-processing going on.  It
> > doesn't have anything to do with data frames vs. lists.
>
>Only if you don't specify colClasses: if you do (and you would need the
>information to use scan()) there should be no performance penalty. (Note
>that matrices can be scan()-ed into a vector and the dimensions added, and
>that will be faster.)



More information about the R-help mailing list