[R] factor always have type integer

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Sep 8 22:44:56 CEST 2004


On Wed, 8 Sep 2004, Erich Neuwirth wrote:

> typeof applied to a factor always seems to return "integer",
> independently of the type of the levels.

typeof is telling you the internal structure. From ?factor

     'factor' returns an object of class '"factor"' which has a set of
     integer codes the length of 'x' with a '"levels"' attribute of
     mode 'character'. 

(Despite that, we don't enforce this and people have managed to create 
factors with non-integer numeric codes.)

Now ?typeof says

     'typeof' determines the (R internal) type or storage mode of any
     object

and that is the "integer" as the codes are stored in an INTSXP.

BTW, factors were an internal type long ago, and were one of the two
unnamed types which appear in output from memory.profile().

> This has a strange side effect.

It's a very well documented feature of data.frame, as others have 
pointed out.

> When a variable is "imported" into a data frame,
> its type changes.
> character variables automatically are converted
> to factors when imported into data frames.
> 
> Here is an example:
> 
>  > v1<-1:3
>  > v2<-c("a","b","c")
>  > df<-data.frame(v1,v2)
>  > typeof(v2)
> [1] "character"
>  > typeof(df$v2)
> [1] "integer"
> 
> It is somewhat surprising that
> the types of v2 and df$v2 are different.
> 
> the answer is to do
> levels(df$v2)[df$v2]
> but that is somewhat involved.
> 
> Should the types not be identical, and typeof applied to factors
> return the type of the levels?
> 
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list