[R] limits of a data frame size for reading into R

Matthew Keller mckellercran at gmail.com
Fri Aug 6 01:56:33 CEST 2010


I sometimes have to work with vectors/matrices with > 2^31 - 1
elements. I have found the bigmemory package to be of great help. My
lab is also going to learn sqldf package for getting bits of big data
into/out of R. Learning both of those packages should help you work
with large datasets in R.

That said, I still hold out hope that someday, the powers that be - or
some hotshot operation like R+ or Revolutions - will see that
increasing numbers of users will routinely need to access > 2^31-1
elements, and that the packages above are a band-aid on a deeper
issue: using such large datasets with ease in R. As of now, it remains
quite awkward.

Matt



On Tue, Aug 3, 2010 at 12:32 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 03/08/2010 2:28 PM, Dimitri Liakhovitski wrote:
>>
>> And once one above the limit that Jim indicated - is there anything one
>> can do?
>>
>
> Yes, there are several packages for handling datasets that are too big to
> fit in memory:  biglm, ff, etc.  You need to change your code to work with
> them, so it's a lot of work to do something unusual, but there are
> possibilities.
>
> Duncan Murdoch
>
>> Thank you!
>> Dimitri
>>
>>
>> On Tue, Aug 3, 2010 at 2:12 PM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>> > Thanks a lot, it's very helpful!
>> > Dimitri
>> >
>> > On Tue, Aug 3, 2010 at 1:53 PM, Duncan Murdoch
>> > <murdoch.duncan at gmail.com> wrote:
>> >> On 03/08/2010 1:10 PM, Dimitri Liakhovitski wrote:
>> >>>
>> >>> I understand the question I am about to ask is rather vague and
>> >>> depends on the task and my PC memory. However, I'll give it a try:
>> >>>
>> >>> Let's assume the goal is just to read in the data frame into R and
>> >>> then do some simple analyses with it (e.g., multiple regression of
>> >>> some variables onto some - just a few - variables).
>> >>>
>> >>> Is there a limit to the number of columns of a data frame that R can
>> >>> handle? I am asking because where I work many use SAS and they are
>> >>> running into the limit of >~13,700columns there.
>> >>>
>> >>> Since I am asking - is there a limit to the number of rows?
>> >>>
>> >>> Or is the correct way of asking the question: my PC's memory is X. The
>> >>> .txt tab-delimited file I am trying to read in has the size of YYY Mb,
>> >>> can I read it in?
>> >>>
>> >>
>> >> Besides what Jim said, there is a 2^31-1 limit on the number of
>> >> elements in
>> >> a vector.  Dataframes are vectors of vectors, so you can have at most
>> >> 2^31-1
>> >> rows and 2^31-1 columns.  Matrices are vectors, so they're limited to
>> >> 2^31-1
>> >> elements in total.
>> >> This is only likely to be a limitation on a 64 bit machine; in 32 bits
>> >> you'll run out of memory first.
>> >>
>> >> Duncan Murdoch
>> >>
>> >
>> >
>> >
>> > --
>> > Dimitri Liakhovitski
>> > Ninah Consulting
>> > www.ninah.com
>> >
>>
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com



More information about the R-help mailing list