[R] Assigning factor to character vector

Milan Bouchet-Valat nalimilan at club.fr
Sat Apr 20 17:01:40 CEST 2013


Le samedi 20 avril 2013 à 07:22 -0700, Bert Gunter a écrit :
> Sorry, I failed to cc: the list.
> 
> I also added a slight edit below to clarify my final statement. -- Bert
> 
> On Sat, Apr 20, 2013 at 7:17 AM, Bert Gunter <bgunter at gene.com> wrote:
> > Milan:
> >
> > 1. The R Inferno was written by Pat Burns and is not in any way an
> > "official" R document. So it is a "Pat Burns" not an "R" issue. You
> > can contact him directly, if you wish -- though he monitors this list
> > and almost surely has seen this.
Sure, I did not imply that the R Inferno was an official document.

> > 2. EVerything works exactly as documented and expected. See the R
> > Language definition ... and perhaps the "Intro to R" tutorial..
But is there a mention of what happens precisely to assignments from
factors? I could not find it. For example, the R Language Definition
does not mention coercion in the Subset Assignment section [1].

> > Inline comments below.
> >
> > Cheers,
> > Bert
> >
> > On Sat, Apr 20, 2013 at 5:49 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> >> Hi!
> >>
> >> Yesterday I accidentally discovered this:
> >>> a <- LETTERS[1:5]
> >>> a
> >> [1] "A" "B" "C" "D" "E"
> >
> > a is a character vector.
> >>>
> >>> a[1] <- factor(a[1])
> > The RHS is an vector of integers with additional attributes that define a factor
> > The replacement of the first element of a, a character vector,  by an
> > integer causes the integer to be silently coerced to a character. The
> > default S3 replacement method is used -- see ?UseMethod. or the R
> > Intro for info on S3 methods
Thanks, but I already understand this part. My surprise comes from the
fact that the default replacement method coerces a factor to a character
in a way which is different from calling as.character() on it. It acts
as if attributes were dropped _before_ coercion (and thus everything
happens as if the factor was a mere integer).

> >>> a
> >> [1] "1" "B" "C" "D" "E"
> >>
> >> BUT:
> >>> b <- factor(LETTERS[1:5])
> > b is a factor
> >
> >>> b
> >> [1] A B C D E
> >> Levels: A B C D E
> >>> b[1] <- factor(b[1])
> >>> b
> >> [1] A B C D E
> >> Levels: A B C D E
> >>> b[1] <- as.character(b[1])
> > The replacement method for a factor is used in
> b[1] <- factor(b[1])
> See ?"[<-.factor" .
Yeah, this part was here to show the asymmetric character of the
factor <-> character assignments.

Regards


1: http://cran.r-project.org/doc/manuals/R-lang.html#Subset-assignment


> > Cheers,
> > Bert
> >
> > The replacement
> >>> b
> >> [1] A B C D E
> >> Levels: A B C D E
> >>
> >> I think this would definitely deserve a mention in the R Inferno...
> >>
> >> I guess this is documented somewhere (though I could not find anything
> >> in help("[<-"). Would someone be kind enough to give me the explanation
> >> of this behavior? I suspect this has something to do with the coercion
> >> order, but I do not really get why a[1] does not get assigned the result
> >> of as.character(factor(a[1]))... Probably, there is no special-casing of
> >> factors, which are handled as integer vectors?
> >>
> >> Wouldn't it be useful to print a warning when this happens, since nobody
> >> reasonable would rely on such a special behavior? I wish R had a "safe
> >> mode" where all these tricky implicit coercion cases would warn... :-/
> >>
> >>
> >> Regards
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> 
>



More information about the R-help mailing list