[Rd] For integer vectors, `as(x, "numeric")` has no effect.

Josh O'Brien joshmobrien at gmail.com
Tue Jan 5 22:55:31 CET 2016


On Tue, Jan 5, 2016 at 1:31 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Josh O'Brien <joshmobrien at gmail.com>
>>>>>>     on Mon, 4 Jan 2016 16:16:51 -0800 writes:
>
>     > On Dec 19, 2015, at 3:32 AM, Martin Maechler <maechler at
>     stat.math.ethz.ch> wrote:
>
>     >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on
>     >>>>>>> Sat, 12 Dec 2015 10:32:51 +0100 writes:
>     >>
>     >>>>>>> John Chambers <jmc at r-project.org> on Fri, 11 Dec
>     >>>>>>> 2015 10:11:05 -0800 writes:
>     >>
>     >>>> Somehow, the most obvious fixes are always
>     >>>> back-incompatible these days.  The example intrigued
>     >>>> me, so I looked into it a bit (should have been doing
>     >>>> something else, but ....)
>     >>
>     >>>> You're right that this is the proverbial
>     >>>> thin-edge-of-the-wedge.
>     >>
>     >>>> The problem is in setDataPart(), which will be called
>     >>>> whenever a class extends one of the vector types.
>     >>
>     >>>> It does as(value, dataClass) The key point is that the
>     >>>> third argument to as(), strict=TRUE by default.  So,
>     >>>> yes, the change will cause all integer vectors to
>     >>>> become double when the class extends "numeric".
>     >>>> Generally, strict=TRUE makes sense here and of course
>     >>>> changing THAT would open up yet more incompatibilities.
>     >>
>     >>>> For back compatibility, one would have to have some
>     >>>> special code in setDataPart() for the case of
>     >>>> integer/numeric.
>     >>
>     >>>> John
>     >>
>     >>>> (Historically, the original sin was probably not making
>     >>>> a distinction between "numeric" as a virtual class and
>     >>>> "double" as a type/class.)
>     >>
>     >>> Yes, indeed.  In the mean time, I've seen more cases
>     >>> where "the change will cause all integer vectors to
>     >>> become double when the class extends "numeric".  seems
>     >>> detrimental.
>     >>
>     >>> OTOH, I still think we could go in the right direction
>     >>> --- hopefully along the wishes of bioconductor S4
>     >>> development, see Martin Morgan's e-mail:
>     >>
>     >>> [This is all S4 - only; should not much affect base R /
>     >>> S3] Currently, "integer" is a subclass of "numeric" and
>     >>> so the "integer become double" part seems unwanted to
>     >>> me.  OTOH, it would really make sense to more formally
>     >>> have the basic subclasses of "numeric" to be "integer"
>     >>> and "double", and to let as(*, "double") to become
>     >>> different to as(*, "numeric") [Again, this is just for
>     >>> the S4 classes and as() coercions, *not* e.g.  for
>     >>> as.numeric() / as.double() !]
>     >>
>     >>> In the DEPRECATED part of the NEWS for R 2.7.0 (April
>     >>> 2008) we have had
>     >>
>     >>> o The S4 pseudo-classes "single" and double have been
>     >>> removed.  (The S4 class for a REALSXP is "numeric": for
>     >>> back-compatibility as(x, "double") coerces to
>     >>> "numeric".)
>     >>
>     >>> I think the removal of "single" was fine, but in
>     >>> hindsight, maybe the removal of "double" -- which was
>     >>> partly broken then -- possibly could rather have been a
>     >>> fixup of "double" along the following
>     >>
>     >>> Current "thought experiment proposal" :
>     >>
>     >>> 1) "numeric" := {"integer", "double"} { class -
>     >>> subclasses } 2) as(1L, "numeric") continues to return 1L
>     >>> .. since integer is one case of "numeric" 3) as(1L,
>     >>> "double") newly returns 1.0 {and in fact would be
>     >>> "equivalent" to as.double(1L)}
>     >>
>     >>> After the above change, S4 as(*, "double") would
>     >>> correspond to S3 as.double but as(*, "numeric") would
>     >>> continue to differ from as.numeric(*), the former *not*
>     >>> changing integers to double.
>     >>
>     >>> Martin
>     >>
>     >> Also note that e.g.
>     >>
>     >> class(pi) would return "double" instead of "numeric"
>     >>
>     >> and this will break all the bad programming style usages
>     >> of
>     >>
>     >> if(class(x) == "numeric")
>     >>
>     >> which I tend to see in gazillions of user and even
>     >> package codes This bad (aka error prone !)  because
>     >> "correct" usage would be
>     >>
>     >> if(inherits(x, "numeric"))
>     >>
>     >> and that of course would *not* break after the change
>     >> above.
>     >>
>     >> - - - -
>     >>
>     >> A week later, I'm still pretty convinced it would be
>     >> worth going in the direction proposed above.
>     >>
>     >> But I was actually hoping for some encouragement or
>     >> "mental support"...  or then to hear why you think the
>     >> proposition is not good or not viable ...
>     >>
>     >>
>
>     > I really like Martin Maechler's "thought experiment
>     > proposal", but (based partly on the reception its gotten)
>     > figure I mustn't be appreciating the complications it
>     > would introduce..
>
> Actually, I've spent half day implementing it and was very
> pleased about it... as matter of fact it passed *all* our checks
> also in all recommended packages (*)
>
> To do it cleanly... with very few code changes,
> the *only* consequence would be that
>
>    class(1.)
>
> (and similar) then returned  "double" instead of "numeric".
> which  *would*  be logical consequent, because indeed,
>
>    numeric = {integer, double}
>
> in that new scheme, and     class(1L) also returns "integer".
>
> To my big chagrin there was very big opposition such a change,
> IIRC, mainly on the grounds that for 20 years or so S and then R
> books and publications had said that double and numeric should
> be basically the same.
>
> (*) Below you have a C level proposal which as you note is
>     similar to John Chambers R level change:
>
> The consequence is that basically you can no longer have "integer"
> entries in "numeric" slots; they are automagically made into "double".
> I personally find that not really "acceptable" {waste of storage},
> and I would guess that more code "out there in package-land and
> user-code" would break than with my change.
>
>     > That said, if it's decided to just make a smaller fix of
>     > as(x, "numeric"), might it be better to make the change at
>     > the C level, to R_set_class in $RHOME/src/main/coerce.c?
>
> I'm not seeing the advantage to make the change there, apart
> from possibly some efficiency gain.
>

One advantage (relative to a solution based on setting a new S4 coerce()
method for signature c("integer", "numeric") ) is that it would also make
the following conversion work as naively expected:

    x <- 10L
    class(x) <- "numeric"
    class(x)
    # [1] "integer"  ## would be "numeric"

I know that's not a recommended strategy for converting an object's
class, but for users like me, trying to make sense of as() and the
class system,  it would be even more perplexing if `as(x, "numeric")`
and `class(x) <- "numeric"` yielded different results.

> For the time being, I will not work on this ... mainly as I still
> believe that my proposal would lead to a much much cleaner setup
> (and yes, even be worth some small changes in new editions of
>  those R books which deal with such subtle issues)
>

Thanks, anyway, for having looked into this. If no changes are to be made,
then it might (?) be worth modifying the "Basic Coercion Methods" section
of ?as. It currently reads:

     Methods are pre-defined for coercing any object to one of the
     basic datatypes.  For example, 'as(x, "numeric")' uses the
     existing 'as.numeric' function.  These built-in methods can be
     listed by 'showMethods("coerce")'.

which is not accurate for integer vectors 'x'.

> Martin<div id="DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><table style="border-top: 1px solid #aaabb6; margin-top: 10px;">
	<tr>
		<td style="width: 105px; padding-top: 15px;">
			<a href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank"><img
src="https://ipmcdn.avast.com/images/logo-avast-v1.png" style="width:
90px; height:33px;"/></a>
		</td>
		<td style="width: 470px; padding-top: 20px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">This email has been sent from a virus-free
computer protected by Avast. <br /><a
href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank" style="color: #4453ea;">www.avast.com</a>
		</td>
	</tr>
</table><a href="#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>



More information about the R-devel mailing list