[R] What is behind class coercion of a factor into a character

Rolf Turner rolf.turner at xtra.co.nz
Mon Oct 22 21:28:34 CEST 2012


WARNING:  Use with caution!

There is a way to effect the catenation of factors:  The data.frame
method for rbind() does this.  E.g.

set.seed(42)
f1 <- factor(sample(letters[1:3],42,TRUE))
f2 <- factor(sample(letters[1:4],66,TRUE))
d1 <- data.frame(f=f1)
d2 <- data.frame(f=f2)
dd <- rbind(d1,d2)
ff   <- dd[,1]

et voila, ff is the "desired" catenation of f1 and f2.
But heed Bert's words of caution below!

     cheers,

         Rolf Turner

On 23/10/12 02:58, Bert Gunter wrote:
> Tal:
>
> There was a recent discussion on this list about this (Sam Steingold
> was the OP IIRC).
>
> The issue is ?c . In particular:
>
> "c is sometimes used for its side effect of removing attributes except
> names, for example to turn an array into a vector."
>
> Hence, the factor attribute is removed and you get what you saw. As
> regards it's "rationale," you may find Bill Dunlap's comments on
> "c()'s unfortunate history" relevant. The problem with factors is
> "what should concatenation do, anyway?" If a <- factor(c("x", "y"))
> and b <- factor(c("y", "z")), what should c(a,b) be? -- There is no
> reason to assume that the "y" in a is the same as the "y" in b!
>
> Cheers,
> Bert
>
> On Mon, Oct 22, 2012 at 6:46 AM, Tal Galili <tal.galili at gmail.com> wrote:
>> Hello all,
>>
>> Please review the following simple code:
>>
>> # make a factor:
>> x <- factor(c("one", "two"))
>>         # what should be the output to the following expression?
>> c(x, "3")    # <===  ????
>>         # I expected it to be as the output of:
>> c(as.character(x), "3")
>>         # But in fact, the output is what would happen if we had ran the
>> next line:
>> c(as.character(as.numeric(x)), "3")
>>         # p.s: c(x, 3) would of course behave differently...
>>
>> I imagine the above behavior is a "feature" (not a bug), but I am curious
>> as to what is the rational behind it.  Is it because of computational
>> efficiency, or something that fixes some case study?
>>




More information about the R-help mailing list