[Rd] R 2.6.0 S4 data breakage, R _data_class(), class<-, etc.

Martin Morgan mtmorgan at fhcrc.org
Sun Oct 7 21:18:41 CEST 2007


John Chambers <jmc at r-project.org> writes:

> Most of your problems seem related to assigning an S4 class to an 
> arbitrary object--a really bad idea, since it can produce invalid objects.
>
> Objects from S4 classes are created by calling the function new(), and 
> in principal _only_ by calling that function.  Objects from one class 
> are coerced to another by calling the function as().

But both 'new' and 'as' appear to produce invalid (in a different
sense, I guess) objects:

> setClass("snp", contains="raw",
+          validity=function(object) {
+              if (length(object) < 1) "too short"
+              else TRUE
+          })
[1] "snp"
> new("snp")
An object of class "snp"
raw(0)
> as(raw(), "snp")
An object of class "snp"
raw(0)
> new("snp", raw())
Error in validObject(.Object) : invalid class "snp" object: too short

Conversely, I think the S4 implementation implicitly requires that
'new' with a single argument (i.e., class name) return a valid object
-- see

https://stat.ethz.ch/pipermail/bioc-devel/2007-September/001323.html

Also, coercing a genome's worth of 'raw' SNPs to 'snp' appears to be
more memory efficient than creating a new 'snp' (even with an explicit
validity check):

> x <- raw(1)
> tracemem(x)
[1] "<0x1034e28>"
> y <- as(x, "snp")
tracemem[0x1034e28 -> 0x7b67e8]: .mergeAttrs setDataPart .Call slot<- @<- asMethod as<- asMethod as 
> y <- new("snp", x)
tracemem[0x1034e28 -> 0x7bed28]: initialize initialize new 
tracemem[0x7bed28 -> 0x8fd968]: .mergeAttrs setDataPart .Call slot<- @<- asMethod as<- initialize initialize new 
tracemem[0x8fd968 -> 0x906518]: switch getDataPart .Call slot validObject initialize initialize new 
> validObject(y <- as(x, "snp"))
tracemem[0x1034e28 -> 0xa1bfc8]: .mergeAttrs setDataPart .Call slot<- @<- asMethod as<- asMethod as validObject 
tracemem[0xa1bfc8 -> 0x9e6dd8]: switch getDataPart .Call slot validObject 
TRUE

Martin

> Assigning a class to any old object is a very S3 idea (and not a good 
> idea except in low-level code there, either).
>
> At the C level there are  macros for new() (R recommends NEW_OBJECT()), 
> although the safest approach when feasible is to allocate the object in 
> R.  The general as() computation really needs to be done in R because of 
> its special use of method dispatch; there are macros for the equivalent 
> of the as.<type>() functions.
>
> Perhaps some improvements to the documentation would make this clearer, 
> although Chapter 7 and Appendix A of Programming with Data seem 
> reasonably definite.
>
> Thanks for sharing your notes.
>
> John
>
>
> Hin-Tak Leung wrote:
>> Hi,
>>
>> (somebody would probably yell at me for not checking 2.6.0rc,
>> for which I can only apologize...)
>>
>> Our R package (snpMatrix in 
>> http://www-gene.cimr.cam.ac.uk/clayton/software/) is broken rather badly
>> in 2.6.0 ; I have fixed most of it now so a new release is imminent;
>> but I'd like to mention a few things, mostly to summarize my experience
>> and hopefully the 'writing R extensions' document can be updated to
>> reflect some of this...
>>
>> 1) We created and bundled some data in the past in the 2.2 to 2.5
>> time frame (well, 18 months in reality);
>> most of them triggers a warning 'pre-2.4.0 S4 objects detected...
>> consider recreating...'
>>    a) I could fix all of them with just 'a <- asS4(a)' and save()
>>         (they are relatively simple objects just missing the S4 object
>>          bit flag)
>>    b) I am surprised one of them were actually saved from 2.5 - our buggy
>>       code no doubt, see below.
>>
>> We never noticed we didn't do SET_S4_OBJECT() in our C code nor
>> asS4() in our R code until this week. Obviously we were mistakenly
>> relying on the S4 method dispatch on S3 objects, which were withdrawn in 
>> 2.6.0...
>>
>> 2) I am surprised that 'class(a)' can read S4 class names, but 
>> 'class(a)<-' does not set the S4 object bit. I suppose the correct way 
>> would be to do new(...)? This needs to be written down somewhere...
>> The asymmetry is somewhat surprising though.
>>
>> 3) We have some C code which branches depending on the S4 class.
>> The R extension doc didn't explain that one needs to do R_data_class()
>> rather than classgets() (or 'getAttrib(x, RClassSymbol)') to retrieve
>> S4 classes; further more,
>> R_data_class() is not part of the public API, and I only found it by
>> looking at the C code of 'class()' (do_class()). But R_data_class()
>> is part of exposed binary interface and the methods package certainly
>> uses it; isn't it time to make it part of the public API? In any case, I 
>> think a way of retrieving the S4 class in C is needed.
>>   
> Yes, or at the least instructions to handle the case of a NULL class 
> attribute, but a macro would be good.
>>   4) The documentation is missing a fair part - specifically,
>> I need to be able to read and write the S4 class attribute...
>> so R_data_class() needs to be documented and exposed as part of the 
>> public API (and included in the Rinternals.h include),
>> and the recommended way of making an S4 object in C? I found
>> classgets() + SET_S4_OBJECT() seem to work, but I'd like an 
>> authoritative answer...
>>
>> 5) I am finding 'class()<-' + asS4() in R and classgets()+ 
>> SET_S4_OBJECT() in C combo's a bit awkward. Is there any reasons why
>> class<- or classgets() (or if there is a more 'correct' API to use for
>> S4) cannot automatically set the S4 bit if the name is a known S4 class?
>>
>> Thanks for reading so far...
>>
>> Hin-Tak
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (208) 667-2793



More information about the R-devel mailing list