[R] How to import the large data into R

tlumley at u.washington.edu tlumley at u.washington.edu
Mon Oct 19 17:04:59 CEST 2009


On Mon, 19 Oct 2009, Jun Chen wrote:

> Dear,
> I would like to deal with microarray data, it can run when i deal with
> little data. However, the amount number of SNP data are 45181, amount
> numbers of animal are 3081,it can not be allocated 1000Mb memory when
> i importing them to R
>
> Procedure sentence show:
>
> m<-matrix(scan("D:/SNPdata.txt"),ncol=nmarkers,byrow=TRUE)
>
> Error show:
> Error: cannot allocate vector of size 1000.0 Mb

It says you don't have enough memory.  When stored as floating point numbers the SNPs will take up 1Gb, which is quite a lot -- more than you can conveniently analyze in a 32-bit version of R[*] -- you probably have more than 1Gb of memory, but R does need to make copies of things.

In  my experience with SNP data, there are two strategies: storing the data more efficiently (1 byte/SNP), as the Bioconductor package snpMatrix does, or reading in just part of the data at a time (what I have usually done).  My approach is to read the data in chunks and store it in a netCDF file with the ncdf package, and then at analysis time to read data as needed from netCDF.  This also works well for parallel processing -- many R sessions can read efficiently from the netCDF file.


[*] you didn't provide the requested information about your system, but "D:" looks Windows.

        -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list