[R] How to more efficently read in a big matrix

Sun Nov 11 20:28:49 CET 2007

Hi Gabor,

I replaced multiple spaces with a single one and tried
the code you suggested. I got:

> library(sqldf)
Loading required package: RSQLite
Loading required package: DBI
Loading required package: gsubfn
Loading required package: proto
> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
> myfile <- file("243_47mel_withnormal_expression_log2.txt")
> stmt <- read.table(myfile, nr = 1, as.is = TRUE)
> stmt <- stmt[regexpr("call", stmt) < 0]
> stmt <- paste("select", paste(stmt, collapse = ","), "from myfile")
> myfile <- file("243_47mel_withnormal_expression_log2.txt")
> DF <- sqldf(stmt, file.format = list(sep = " "))
Error in try({ :
  RS-DBI driver: (RS_sqlite_import:
./243_47mel_withnormal_expression_log2.txt line 6651 expected 488
columns of data but found 641)
In addition: Warning message:
closing unused connection 3 (243_47mel_withnormal_expression_log2.txt)
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: unrecognized token: "2SignalA")
>

What can you suggest? Sth wrong with the input file you can think of?

Thanks!

Allen

On Nov 10, 2007 10:37 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Thanks.
>
>
> On Nov 10, 2007 10:29 AM, affy snp <affysnp at gmail.com> wrote:
> > Gabor,
> >
> > I will do it either later today or tomorrow. Promised.
> >
> > Allen
> >
> >
> > On Nov 10, 2007 10:23 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > > Please try out the sqldf solution as well and let me know
> > > how it compares since I have never tried anything
> > > this large and would be interested to know.
> > >
> > >
> > > On Nov 10, 2007 9:27 AM, affy snp <affysnp at gmail.com> wrote:
> > > > Thanks all for the help and suggestions. By specifying the colClass in
> > > > read.table()
> > > > and running it on a server with 8Gb memory, I could have the data read
> > > > in 2 mins.
> > > > I will just skip sqldf method for now and get back in a moment.
> > > >
> > > > Best,
> > > >       Allen
> > > >
> > > >
> > > > On Nov 10, 2007 2:42 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> > > > > Did you read the Note on the help page for read.table, or the 'R Data
> > > > > Import/Export Manual'?  There are several hints there, some of which will
> > > > > be crucial to doing this reasonably fast.
> > > > >
> > > > > How big is your computer?  That is 116 million items (you haven't told us
> > > > > what type they are), so you will need GBs of RAM, and preferably a 64-bit
> > > > > OS.  Otherwise you would be better off using a DBMS to store the data (see
> > > > > the Manual mentioned in my first para).
> > > > >
> > > > >
> > > > > On Fri, 9 Nov 2007, affy snp wrote:
> > > > >
> > > > > > Dear list,
> > > > > >
> > > > > > I need to read in a big table with 487 columns and 238,305 rows (row names
> > > > > > and column names are supplied). Is there a code to read in the table in
> > > > > > a fast way? I tried the read.table() but it seems that it takes forever :(
> > > > > >
> > > > > > Thanks a lot!
> > > > > >
> > > > > > Best,
> > > > > >    Allen
> > > > >
> > > > > --
> > > > > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > > > > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > > > > University of Oxford,             Tel:  +44 1865 272861 (self)
> > > > > 1 South Parks Road,                     +44 1865 272866 (PA)
> > > > > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> > > > >
> > > >
> > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> >
>