[R] Reading large files in R

Berton Gunter gunter.berton at gene.com
Mon Aug 8 21:35:52 CEST 2005


... and it is likely that even if you did have enough memory (several times
the size of the data are generally needed) it would take a very long time.

If you do have enough memory and the data are all of one type -- numeric
here -- you're better off treating it as a matrix rather than converting it
to a data frame.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Adaikalavan Ramasamy
> Sent: Monday, August 08, 2005 12:02 PM
> To: Jean-Pierre Gattuso
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Reading large files in R
> 
> >From Note section of help("read.delim") :
> 
>      'read.table' is not the right tool for reading large matrices,
>      especially those with many columns: it is designed to read _data
>      frames_ which may have columns of very different classes. Use
>      'scan' instead.
> 
> So I am not sure why you used 'scan', then converted it to a 
> data frame.
> 
> 1) Can provide an sample of the data that you are trying to read in.
> 2) How much memory does your machine has ?
> 3) Try reading in the first few lines using the nmax argument in scan.
> 
> Regards, Adai
> 
> 
> 
> On Mon, 2005-08-08 at 12:50 -0600, Jean-Pierre Gattuso wrote:
> > Dear R-listers:
> > 
> > I am trying to work with a big (262 Mb) file but apparently 
> reach a  
> > memory limit using R on a MacOSX as well as on a unix machine.
> > 
> > This is the script:
> > 
> >  > type=list(a=0,b=0,c=0)
> >  > tmp <- scan(file="coastal_gebco_sandS_blend.txt", what=type,  
> > sep="\t", quote="\"", dec=".", skip=1, na.strings="-99", 
> nmax=13669628)
> > Read 13669627 records
> >  > gebco <- data.frame(tmp)
> > Error: cannot allocate vector of size 106793 Kb
> > 
> > 
> > Even tmp does not seem right:
> > 
> >  > summary(tmp)
> > Error: recursive default argument reference
> > 
> > 
> > Do you have any suggestion?
> > 
> > Thanks,
> > Jean-Pierre Gattuso
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list