[R] how to concatenate factor vectors?

William Dunlap wdunlap at tibco.com
Thu Oct 18 17:33:38 CEST 2012


c() has an unfortunate history.  Originally, c(x) stripped the attributes,
except names but including  dim, dimnames, and class, from x.
Also, c(x,y) stripped the attributes from both x and y and concatenated
them.  Also, c(nameA=1,nameB=2) constructed a vector with a names attribute.

Then c() became a generic function and people wrote methods for certain
classes, typically newer classes without the weight of history on them, that kept
at least the class and would combine 2 or more items of that class.  Adding
a c.factor became tricky because old code used c(factor(...)) to strip the class
and levels attributes to get the integer codes.

You can make a c() that does what you want for your factors by subclassing
factor and writing a c.<yourFactor> that does what you want.  This will not
break old code.  E.g.,
   myFactor <- function(...) {
      tmp <- factor(...)
      class(tmp) <- class("myFactor", class(tmp)) 
      tmp }
   c.myFactor <- function(...) {
      ... compare levels of inputs with identical() and do what you want ...
      ... return something with the right class ...
   }

Or, you can decide to write  a new concatenation function
and stop using c().

As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make sense there.
identical() is a pretty quick way to check that two objects have identical contents.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Sam Steingold
> Sent: Thursday, October 18, 2012 8:02 AM
> To: r-help at r-project.org; Jorge I Velez
> Subject: Re: [R] how to concatenate factor vectors?
> 
> hi Jorge,
> 
> > * Jorge I Velez <wbetrvinairyrm at tznvy.pbz> [2012-10-18 16:43:58 +1100]:
> >
> >> a <- factor(5:1,levels=1:9)
> >> b <- factor(9:1,levels=1:9)
> >> lev <- sort(unique(f <- c(a, b)))
> >> f <- factor(f, levels = lev)
> >> str(f)
> >  Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...
> 
> is sort(unique()) really necessary?
> I think
> lev <- levels(a)
> should be enough.
> 
> However, this does not quite do what I want.
> I want a function which will _NOT_ have a non-factor vector as an
> intermediate value because that would waste a LOT of memory in my case.
> I want a function which will check that a and b have identical levels
> (in Lisp lingo, the levels are EQ, not just EQUALP).
> 
> --8<---------------cut here---------------start------------->8---
> > a <- factor(letters[sample(1:10,20,replace=TRUE)],levels=letters)
>  [1] e e a b c e j d a b h i a e e g j a c e
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> > b <- factor(letters[sample(1:10,30,replace=TRUE)],levels=letters)
>  [1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> > c(a,b)
>  [1]  5  5  1  2  3  5 10  4  1  2  8  9  1  5  5  7 10  1  3  5  4  4  6  3 10
> [26]  2  4  5 10 10  7  9  7 10 10  7  7  1 10  1  2  5  4  3  2  9  9  1  2  6
> > factor(letters[c(a,b)],levels=letters)
>  [1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
> [39] j a b e d c b i i a b f
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> --8<---------------cut here---------------end--------------->8---
> 
> however, this is not a "direct" way (unlike my unlist(list(...))):
> there is an intermediate integer vector c(a,b) which is mapped to a
> character vector via letters, which is converted back to integers
> (==factors).
> 
> IIUC, a factor is an integer vector which knows that the integers refer
> to levels.
> 
> c(a,b) creates such an integer vector.
> How do I tell it that it is a factor?
> 
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://palestinefacts.org http://www.memritv.org
> http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
> usually: can't pay ==> don't buy. software: can't buy ==> don't pay
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list