[R] One critical question in R

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Tue Aug 4 18:07:48 CEST 2009


On Tue, Aug 4, 2009 at 4:20 PM, Hyo Karen Lee<totemo22 at gmail.com> wrote:

> I am currently working on some research which involves huge amounts
> of data(it is about 15GB).

 One point nobody has seemed to make yet is that the above statement
is meaningless...

 Do you have a CSV file that is 15GB big? The important number is the
product of the numbers of rows and columns, not the file size. It
takes 21 bytes to store "1.2345678901234567890" in a CSV file, but
only 8 to store it in R. There's a reduction in size of nearly a
factor of three.

 Or do you have an XLS file that is 15GB big? In which case, who knows
how much bloat Microsoft have stuffed in there. Again, the important
number is the product of the numbers of rows and columns.

 The fundamental thing is the number of numbers (and factors), not the
file size.

Barry




More information about the R-help mailing list