[R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Feb 29 15:08:25 CET 2012


On 29/02/2012 13:41, Duncan Murdoch wrote:
> On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
>> Factors are internally stored as integers (enums if you have used
>> other programming languages) with a special label set -- it's more
>> memory efficient than storing the whole string over and over.
>
> That was one of the original justifications, but character vectors are
> just as memory efficient these days.

No, not really.  Character vectors (STRSXPs) store a pointer for each 
string entry, and factors store an integer.  On most current systems 
pointers are twice the size of integers, so on a 64-bit system

 > a <- rep(letters[1:10], each = 1000)
 > object.size(a)
80520 bytes
 > object.size(as.factor(a))
41008 bytes


> The other justifications are still valid: sometimes you have a vector
> which only takes on a subset of the possible values it could take, and
> when you tabulate it, you'd like to see those zero counts. You may also
> want to control the display order, and a factor allows that.
>
> For example:
>
> x <- c("a", "a", "b")
> table(x)
> x <- factor(x, levels=c("c", "b", "a"))
> table(x)
>
> Duncan Murdoch

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list