[Rd] strange behavior in 'inherits' check for loaded S4 object

Martin Maechler maechler at stat.math.ethz.ch
Sat Jul 30 17:07:31 CEST 2016


>>>>> Kevin Ushey <kevinushey at gmail.com>
>>>>>     on Fri, 29 Jul 2016 11:46:19 -0700 writes:

    > I should add one more item that may be related here --
    > calling 'methods:::.requirePackage' returns a different
    > result based on whether the package namespace is already
    > loaded or not.

    > If the package namespace is not loaded, the package is
    > loaded and attached, and the package environment is
    > returned:

    >> methods:::.requirePackage("digest")
    >     Loading required package: digest <environment:
    > package:digest> attr(,"name") [1] "package:digest"
    > attr(,"path") [1]
    > "/Users/kevin/Library/R/3.3/library/digest"
    >> "digest" %in% loadedNamespaces()
    >     [1] TRUE
    >> "package:digest" %in% search()
    >     [1] TRUE

    > On the other hand, if the package namespace has already
    > been loaded, the package namespace is returned without
    > attaching the package:

    >> requireNamespace("digest")
    >     Loading required namespace: digest
    >> methods:::.requirePackage("digest")
    >     <environment: namespace:digest>
    >> "digest" %in% loadedNamespaces()
    >     [1] TRUE
    >> "package:digest" %in% search()
    >     [1] FALSE

    > This may be intentional, but the behavior seems surprising
    > and could be responsible for the behavior outlined
    > earlier.

Yes, the behavior you outlined earlier is buggy, and I also have
seen similar bugous behavior for the case of non-exported
classes.

Part of it is historical:  The S4 code was mostly written before
namespaces were introduced into R;   I vaguely remember John
Chambers (the principal creator of S4) saying that he did not
intend the formal classes to be not visible... which in some
sense only contains the fact that he (or anybody) would not
think much about hidden objects before they were introduced.

Still, in the mean time, most of us have seen many cases where
we wanted to have "private" classes,  and many packages do have
them, too.... and they "mostly work" ;-)

In other words, I agree that it would be very desirable to get
to the bottom of this and fix such problems.

.requirePackage() is among the parts of the methods package code
which are quite delicate (and not much documented AFAIK, the hidden
.requirePackage() function is a good example!).

Delicate for at least two reasons:

1) They are not only used in crucial steps when "bootstrapping"
   the methods package ('methods' has to define its own S4
   generics, methods, and classes before the package "exists"),

1b) they are also used both when building and installing another
    'methods'-dependent package.  This could be called
    "bootstrapping another package".

2) they are also much used whenever S4 code (aka
   'methods'-dependent) packages are loaded and attached,
   so should be as fast as possible.
   I think you know that in recent years there have been
   considerable (and successful!) efforts to speedup package
   loading time, e.g. speeding up 'library(Matrix)' by about
   2 factors of 2, and changing such functions should not
   deteriorate package-load speed noticably.

Still, it is very desirable to improve / fix the issue:
After all this musings, I'd currently guess that it would be a
good idea if  .requirePackage()  would always return the
*namespace* of the corresponding package unless that does not
yet exist, as in the the 'bootstrap' situations above.
... or we'd add a new argument to .requirePackage() dealing with
that, or we use two functions:  .requirePackage() needed for
boostrapping, returning a package envionment, and
.requireNamespace() used to access class (and generic function!)
environments.

Well tested patches are very welcome; as is filing a formal
bug report with bugzilla (https://bugs.r-project.org/ ;
(if you have no "account" there, we will have to 
manually create one, for now, see the 'Note' on
https://www.r-project.org/bugs.html because of the spammer
attacks earlier this month ... but I see you, Kevin are
registered there),
or
just continuing your findings here.

Martin


    > Best, Kevin

    > On Fri, Jul 29, 2016 at 11:37 AM, Kevin Ushey
    > <kevinushey at gmail.com> wrote:
    >> I have a small idea as to what's going on now; at least,
    >> why exporting the class resolves this particular issue.
    >> 
    >> Firstly, when an S4 class is not exported, the associated
    >> '.__C__<class>' object is not made part of the package
    >> environment.  For example, I see:
    >> 
    >> > getAnywhere(".__C__SubMatrix") A single object matching
    >> '.__C__SubMatrix' was found It was found in the following
    >> places namespace:s4inherits with value < ... >
    >> 
    >> Note that the symbol is only discovered in the package
    >> namespace. When the class is exported (e.g. with
    >> 'exportClasses(SubMatrix)' in the NAMESPACE file), it's
    >> found both in 'package:s4inherits' and
    >> 'namespace:s4inherits'.
    >> 
    >> Secondly, when R attempts to resolve the superclasses for
    >> an S3 class, the function 'methods:::.extendsForS3' is
    >> called. Tracing that code eventually gets us here:
    >> 
    >> https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L255
    >> 
    >> Note that we reach this code path as the S3 class cache
    >> has not been populated yet; ie, this code returns NULL:
    >> 
    >> https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L238-L240
    >> 
    >> So, the class hierarchy is looked up using this code:
    >> 
    >> if(isTRUE(nzchar(package))) { whereP <-
    >> .requirePackage(package) value <- get0(cname, whereP,
    >> inherits = inherits) }
    >> 
    >> However, because the '.__C__SubMatrix' object is only
    >> made available in the package's namespace, not within the
    >> package environment, it is not resolved, and so lookup
    >> fails. (Presumedly, that lookup is done when initially
    >> building a cache for S3 dispatch?) So, I wonder if that
    >> class lookup should occur within the package's namespace
    >> instead?
    >> 
    >> Thanks for your time, Kevin
    >> 
    >> On Sat, Jun 25, 2016 at 12:46 PM, Kevin Ushey
    >> <kevinushey at gmail.com> wrote:
    >>> Hi,
    >>> 
    >>> (sorry for the wall of text; the issue here appears to
    >>> be rather complicated)
    >>> 
    >>> I'm seeing a somewhat strange case where checking
    >>> whether an S4 object inherits from a parent class
    >>> defined from another package with 'inherits' fails if
    >>> that object is materialized through a call to
    >>> 'load'. That's a mouthful, so I've put together a
    >>> relatively small reproducible example online here:
    >>> 
    >>> https://github.com/kevinushey/s4inherits
    >>> 
    >>> This package, 's4inherits', defines an S4 class,
    >>> 'SubMatrix', that inherits from the 'dsyMatrix' class
    >>> defined in the Matrix package.  After installing the
    >>> package, I run some simple tests:
    >>> 
    >>> $ R -f test-save.R
    >>> 
    >>>> library(s4inherits) data <- SubMatrix(1)
    >>>> 
    >>>> is(data, "SubMatrix")
    >>> [1] TRUE
    >>>> inherits(data, "SubMatrix")
    >>> [1] TRUE
    >>>> 
    >>>> is(data, "dsyMatrix")
    >>> [1] TRUE
    >>>> inherits(data, "dsyMatrix")
    >>> [1] TRUE
    >>>> 
    >>>> save(data, file = "test.RData")
    >>>> 
    >>> 
    >>> All the inheritance checks report as we would expect. I
    >>> check that the inheritance reports are as expected, then
    >>> save that object to 'test.RData'. I then load that data
    >>> file in a new R session and run the same checks:
    >>> 
    >>> $ R -f test-load.R
    >>> 
    >>>> library(methods) load("test.RData")
    >>>> 
    >>>> inherits(data, "SubMatrix")
    >>> Loading required package: s4inherits [1] TRUE
    >>>> is(data, "SubMatrix")
    >>> [1] TRUE
    >>>> 
    >>>> inherits(data, "dsyMatrix")
    >>> [1] FALSE # (??)
    >>>> is(data, "dsyMatrix")
    >>> [1] TRUE
    >>>> 
    >>> 
    >>> Note that R now reports that my loaded object does _not_
    >>> inherit from "dsyMatrix", yet this occurs only when
    >>> checked with 'inherits()' -- 'is' produces the expected
    >>> result.
    >>> 
    >>> I do not see the behavior if I explicitly load / attach
    >>> the 's4inherits' package before loading the associated
    >>> data file; it only occurs if the package namespace is
    >>> loaded in response to loading the data object hosting a
    >>> 'SubMatrix' object.
    >>> 
    >>> More precisely, the package namespace is loaded when the
    >>> promise hosting the data object is evaluated; that
    >>> promise being generated by 'load', if I understand
    >>> correctly. Somehow, evaluation of that promise within
    >>> the call to 'inherits' in this very special case causes
    >>> the unexpected behavior -- ie, if the first thing that
    >>> you do with the loaded object is check its class with
    >>> 'inherits', then you get this unexpected result.
    >>> 
    >>> Even more, this behavior seems to go away if the
    >>> 's4inherits' package explicitly exports the class --
    >>> however, you could imagine this class normally being
    >>> internal to the package, and so it may be undesirable to
    >>> export it.
    >>> 
    >>> I checked a bit into the C code, and IIUC the check here
    >>> looks up the class hierarchy in the R_S4_extends_table
    >>> object defined in 'attrib.c', so it seems like that
    >>> mapping is potentially not getting constructed with the
    >>> full hierarchy (although hopefully someone with more
    >>> knowledge of the S4 internals + interaction with S3 can
    >>> elaborate).
    >>> 
    >>> (FWIW, this was distilled from a case where S3 dispatch
    >>> on a similar loaded S4 object failed, due to failure to
    >>> resolve the class hierarchy for the S3 dispatch
    >>> context.)
    >>> 
    >>> Thanks, Kevin
    >>> 
    >>> ---
    >>> 
    >>> $ R --slave -e "utils::sessionInfo()" R Under
    >>> development (unstable) (2016-06-13 r70769) Platform:
    >>> x86_64-apple-darwin15.5.0 (64-bit) Running under: OS X
    >>> 10.11.5 (El Capitan)

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list