[R] large data set, error: cannot allocate vector

JASON BARNHART jasoncbarnhart at msn.com
Wed May 10 05:14:49 CEST 2006


OK, I'm a conehead.

There's no memory.limit() on my LINUX setup; neither is there a 
--max-memory-size option.

Sorry for any false trails.

-jason


>From: "Jason Barnhart" <jasoncbarnhart at msn.com>
>To: "Robert Citek" <rwcitek at alum.calberkeley.org>, 
><r-help at stat.math.ethz.ch>
>Subject: Re: [R] large data set, error: cannot allocate vector
>Date: Tue, 9 May 2006 14:32:45 -0700
>
>Robert,
>
>Thanks, I stand corrected on the RAM issue re: 32 vs. 64 bit builds.
>
>As for the --max-memory-size option, I'll try to check my LINUX version at
>home tonight.
>
>-jason
>
>----- Original Message -----
>From: "Robert Citek" <rwcitek at alum.calberkeley.org>
>To: <r-help at stat.math.ethz.ch>
>Cc: "Jason Barnhart" <jasoncbarnhart at msn.com>
>Sent: Tuesday, May 09, 2006 1:27 PM
>Subject: Re: [R] large data set, error: cannot allocate vector
>
>
> >
> > On May 9, 2006, at 1:32 PM, Jason Barnhart wrote:
> >
> >> 1) So the original problem remains unsolved?
> >
> > The question was answered but the problem remains unsolved.  The  
>question
> > was, why am I getting an error "cannot allocate vector" when  reading in 
>a
> > 100 MM integer list.  The answer appears to be:
> >
> > 1) R loads the entire data set into RAM
> > 2) on a 32-bit system R max'es out at 3 GB
> > 3) loading 100 MM integer entries into a data.frame requires more  than 
>3
> > GB of RAM (5-10 GB based on projections from 10 MM entries)
> >
> > So, the new question is, how does one work around such limits?
> >
> >> You can load data but lack memory to do more (or so it appears). It
> >> seems to me that your options are:
> >>    a) ensure that the --max-mem-size option is allowing R to  utilize 
>all
> >> available RAM
> >
> > --max-mem-size doesn't exist in my version:
> >
> > $ R --max-mem-size
> > WARNING: unknown option '--max-mem-size'
> >
> > Do different versions of R on different OSes and different platforms  
>have
> > different options?
> >
> > FWIW, here's the usage statement from ?mem.limits:
> >
> > R --min-vsize=vl --max-vsize=vu --min-nsize=nl --max-nsize=nu --max-
> > ppsize=N
> >
> >>    b) sample if possible, i.e. are 20MM necessary
> >
> > Yes, or within a factor of 4 of that.
> >
> >>    c) load in matrices or vectors, then "process" or analyze
> >
> > Yes, I just need to learn more of the R language to do what I want.
> >
> >>    d) load data in database that R connects to, use that engine for
> >> processing
> >
> > I have a gut feeling something like this is the way to go.
> >
> >>    e) drop unnecessary columns from data.frame
> >
> > Yes.  Currently, one of the fields is an identifier field which is a  
>long
> > text field (30+ chars).  That should probably be converted to an  
>integer
> > to conserve on both time and space.
> >
> >>    f) analyze subsets of the data (variable-wise--review fewer vars  at 
>a
> >> time)
> >
> > Possibly.
> >
> >>    g) buy more RAM (32 vs 64 bit architecture should not be the  issue,
> >> since you use LINUX)
> >
> > 32-bit seems to be the limit.  We've got 6 GB of RAM and 8 GB of  swap.
> > Despite that R chokes well before those limits are reached.
> >
> >>    h) ???
> >
> > Yes, possibly some other solution we haven't considered.
> >
> >> 2) Not finding memory.limit() is very odd.  You should consider
> >> reviewing the bug reporting process to determine if this should be
> >> reported.  Here's an example of my output.
> >>    > memory.limit()
> >>    [1] 1782579200
> >
> > Do different versions of R on different OSes and different platforms  
>have
> > different functions?
> >
> >> 3) This may not be the correct way to look at the timing  differences 
>you
> >> experienced. However, it seems R is holding up well.
> >>
> >>                    10MM  100MM  ratio-100MM/10MM
> >>           cat      0.04   7.60  190.00
> >>          scan      9.93  92.27    9.29
> >> ratio scan/cat    248.25  12.14
> >
> > I re-ran the timing test for the 100 MM file taking caching into  
>account.
> > Linux with 6 GB has no problem caching the 100 MM file (600  MB):
> >
> >                     10MM    100MM  ratio-100MM/10MM
> >           cat       0.04     0.38    9.50
> >          scan       9.93    92.27    9.29
> > ratio scan/cat    248.25   242.82
> >
> >> Please let me know how you resolve.  I'm curious about your solution
> >> HTH,
> >
> > Indeed, very helpful.  I'm learning more about R every day.  Thanks  for
> > your feedback.
> >
> > Regards,
> > - Robert
> > http://www.cwelug.org/downloads
> > Help others get OpenSource software.  Distribute FLOSS
> > for Windows, Linux, *BSD, and MacOS X with BitTorrent
> >
> >
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html




More information about the R-help mailing list