(PR#1608) merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)

Prof Brian D Ripley ripley@stats.ox.ac.uk
Wed, 29 May 2002 12:34:32 +0100 (BST)


On Wed, 29 May 2002 a296180@agate.fmr.com wrote:

> If the following two conditions are met:
>
> 1) all.x is TRUE
>
> 2) at least 1 row in y does not have a match in x
>
> then any character vectors in y will be coerced to be factors. Here is a simple
> example (previously provided on r-devel):
>
> > x <- data.frame(a = 1:4)
> > y <- data.frame(b = LETTERS[1:3])
> > y$b <- as.character(y$b)
> > z <- merge(x, y, by = 0, all.x = TRUE)
> > z
>   Row.names a    b
> 1         1 1    A
> 2         2 2    B
> 3         3 3    C
> 4         4 4   <NA>
> > sapply(z, data.class)
> Row.names         a         b
>  "factor" "numeric"  "factor"
> >
>
> This problem could be fixed by changing the line in merge.data.frame:
>
> for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx)
>
> to:
>
> for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA

But other problems would be introduced, as the two operations are
not equivalent (and the right one has been used).

> To the extent that this is a feature rather than a bug (if so, I would like to
> know why),

I have already patiently explained it to you.  It is a side issue of
subscripting of data frames converting character columns to factor.
I have also given you a workaround.

>  then I would suggest that the following sentence be added to the
> documentation for merge at the end of the section on all.x
>
> "Be aware that, if all.x equals `TRUE', character vectors in `y' will be
> converted to factors if any rows in y have no matching row in `x'."

As I said before, this is a consequence of the general rules.  Data frames
are not designed to have character columns, and those who insist on using
them must make themselves aware of the consequences.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._