[R] Memory management

Sun Sep 16 06:36:21 CEST 2007

If you data file has 49M rows and 249 columns, then if each column had
5 characters, then you are looking at a text file with 60GB.  If these
were all numerics (8 bytes per number), then you are looking at an R
object that would be almost 100GB.  If this is your data, then this is
definitely a candidate for a data base since you would need a fairly
large machine (at least 300GB of real memory).

You probably need to give some serious thought to how you want to
store your data and then what type of processing you need to do on it.
BTW, do you need all 249 columns, or could you work with just 3-4
columns at a time (this at least makes an R object of about 1.5GB
which might be easier to handle).

On 9/16/07, Takatsugu Kobayashi <tkobayas at indiana.edu> wrote:
> Hi,
>
> I apologize again for posting something not suitable on this list.
>
> Basically, it sounds like I should go put this large dataset into a
> database... The dataset I have had trouble with is the transportation
> network of Chicago Consolidated Metropolitan Statistical Area. The
> number of samples is about 7,200 points; and every points have outbound
> and inbound traffic flows: volumes, times, distances, etc. So a quick
> approximation of the number of rows would be
> 49,000,000 rows (and 249 columns).
>
> This is a text file. I could work with a portion of the data at a time
> like nearest neighbors or pairs of points.
>
> I used read.table('filename',header=F).. I should probably use some bits
> of data at a time instead of putting all at a time...
>
> I am learning RSQLite and RMySQL. As Mr. Wan suggests, I will learn C a
> bit more.....
>
> Thank you very much.
>
> TK
>
> im holtman wrote:
> > When you say you can not import 4.8GB, is this the size of the text
> > file that you are reading in?  If so, what is the structure of the
> > file?  How are you reading in the file ('read.table', 'scan', etc).
> >
> > Do you really need all the data or can you work with a portion at a
> > time?  If so, then consider putting the data in a database and
> > retrieving the data as needed.  If all the data is in an object, how
> > big to you think this object will be? (# rows, # columns, mode of the
> > data).
> >
> > So you need to provide some more information as to the problem that
> > you are trying to solve.
> >
> > On 9/15/07, tkobayas at indiana.edu <tkobayas at indiana.edu> wrote:
> >
> >> Hi,
> >>
> >> Let me apologize for this simple question.
> >>
> >> I use 64 bit R on my Fedora Core 6 Linux workstation. A 64 bit R has
> >> saved a lot of time. I am sure this is a lot to do with my memory
> >> limit, but I cannot import 4.8GB. My workstation has a 8GB RAM, Athlon
> >> X2 5600, and 1200W PSU. This PC configuration is the best I could get.
> >>
> >> I know a bit of C and Perl. Should I use C or Perl to manage this large
> >> dataset? or should I even go to 16GB RAM.
> >>
> >> Sorry for this silly question. But I appreciate if anyone could give me
> >> advice.
> >>
> >> Thank you very much.
> >>
> >> TK
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?