[R] How to more efficently read in a big matrix

affy snp affysnp at gmail.com
Sat Nov 10 06:19:12 CET 2007


Thanks Jim.

I tried:

A<-read.table(file="243_47mel_withnormal_expression_log2.txt",
+header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486)))

by specifying colClass but it did not work.

The error message I got is:

> A<-read.table(file="243_47mel_withnormal_expression_log2.txt",header=TRUE,row.names=1,colClasses=c('factor', rep('numeric',486)))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got 'B'

Let me try what you suggested.

Thanks!

Allen


On Nov 10, 2007 12:07 AM, jim holtman <jholtman at gmail.com> wrote:
> If they are all numeric, then read it in with:
>
> x <- scan('yourfile', what=0)  # assuming blank separators
>
> This will create a single vector of the values.  Now this comes in in
> row order if that is what your data file has, so you could just add
> dimensions of
>
> dim(x) <- c(487, 238305)
>
> rows and columns are transposed, but if you have enough memory, you
> can transpose them, or just leave the data as is, and change your
> processing to reorder the rows/cols.  This should lets you read it in
> in the fastest manner and then play with it.
>
>
> On Nov 9, 2007 11:52 PM, affy snp <affysnp at gmail.com> wrote:
> > Hi Jim,
> >
> > Thanks a lot! I am currently running it on my laptop but without any
> > success. I could upload it to a server which is with 8Gb memory
> > and it might be better to go from there.
> >
> > Actually, I could have the whole file splitted in two parts,
> > one with 2nd column to 95th column, the other one with
> > the rest of columns. However, I need all rows for the
> > two parts.
> >
> > The file is in txt format and around 480Mb, very large though.
> > Yes, it is of numeric values.
> >
> > I appreciate!
> >
> > Allen
> >
> >
> >
> >
> >
> >
> > On Nov 9, 2007 11:46 PM, jim holtman <jholtman at gmail.com> wrote:
> > > If they are all numeric, you can use 'scan' to read them in.  With
> > > that amount of data, you will need almost 1GB to contain the single
> > > object.  If you want to do any processing, you will probably need a
> > > machine with at least 3-4GB of physical memory, preferrably a 64-bit
> > > version of R.  What type of computer are you using?  Do you really
> > > need all the data in at once, or can you process it in smaller batches
> > > (e.g., 20,000 rows at a time)?  So a little more detail on what you
> > > actually want to do with the data would be useful, since it does
> > > create a very large object.  BTW how large is the file you are reading
> > > and what is its format?  Have you considered a database with this
> > > amount of data?
> > >
> > >
> > > On Nov 9, 2007 11:39 PM, affy snp <affysnp at gmail.com> wrote:
> > > > Dear list,
> > > >
> > > > I need to read in a big table with 487 columns and 238,305 rows (row names
> > > > and column names are supplied). Is there a code to read in the table in
> > > > a fast way? I tried the read.table() but it seems that it takes forever :(
> > > >
> > > > Thanks a lot!
> > > >
> > > > Best,
> > > >    Allen
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > >
> > >
> > > --
> > > Jim Holtman
> > > Cincinnati, OH
> > > +1 513 646 9390
> > >
> > > What is the problem you are trying to solve?
> > >
> >
>
>
>
> --
>
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



More information about the R-help mailing list