[R] Memory consumption, integer versus factor

Ajay Narottam Shah ajayshah at mayin.org
Sat Apr 30 06:44:13 CEST 2005


R is so smart! I found that when you switch a column from integer to
factor, the memory consumption goes down rather impressively.

Now I'd like to learn more. How does R do this? What does R do? How do
I learn more?

I got to thinking: If I was really smart, I'd see that a factor with 2
levels requires only 1 bit of storage. So I'd be able to cram 8 such
factors into a byte. But this would come at the price of complexity of
code since reading and writing that object would require sub-byte
operations. Does R go this far? I think not, given the more modest
gains that I see. Does he go down till a byte? A four-byte word
instead of 8-bytes of storage?

What are Ncells and Vcells, and what determines his consumption of
memory for each kind?

If you're curious about this, here's a program that serves as a demo:

   x <- matrix(as.numeric(runif(1e6)>.5), nrow=100000)
   D <- data.frame(x)
   rm(x)

   # Take stock:
   gc()
   sum(gc()[,2])
   object.size(D)

   # Switch to factors --
   D$X1 <- factor(D$X1);   D$X2 <- factor(D$X2);   D$X3 <- factor(D$X3)
   D$X4 <- factor(D$X4);   D$X5 <- factor(D$X5);   D$X6 <- factor(D$X6)
   D$X7 <- factor(D$X7);   D$X8 <- factor(D$X8);   D$X9 <- factor(D$X9)
   D$X10 <- factor(D$X10)

   # Take stock:
   gc()
   sum(gc()[,2])
   object.size(D)


Using this, I find that the cost of these 10 vectors goes down from 12
Meg to 8 Meg. This suggests savings, but not the dramatic impact of
recognising that a factor with 2 levels only requires 1 bit.

-- 
Ajay Shah                                                   Consultant
ajayshah at mayin.org                      Department of Economic Affairs
http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi




More information about the R-help mailing list