[Rd] S4 class extending data.frame?

Oleg Sklyar osklyar at ebi.ac.uk
Thu Dec 13 16:32:23 CET 2007


Thanks for your comments. I cannot recall now when I had the situation
that I wanted to inherit from a data.frame, but the fact was that I
could not set the data. So now it just popped up and I thought it was
indeed unfortunate that data.frame structure did not follow the same
principles as other "standard" classes do.

Regarding named lists, modifying .Data directly may play a bad joke
until one clearly thinks about all aspects of the object. I had a
similar situation as well and after that am very careful about such
things (well, I had it in C when creating an object with names
attribute). The thing is: names is and independent attribute, so there
is a potential possibility to set .Data at different length from names
etc when working directly. Thanks for pointing this out anyway.

Regards,
Oleg


On Thu, 2007-12-13 at 07:01 -0800, Martin Morgan wrote:
> Ben, Oleg --
> 
> Some solutions, which you've probably already thought of, are (a) move
> the data.frame into its own slot, instead of extending it, (b) manage
> the data.frame attributes yourself, or (c) reinvent the data.frame
> from scratch as a proper S4 class (e.g., extending 'list' with
> validity constraints on element length and homogeneity of element
> content).
> 
> (b) places a lot of dependence on understanding the data.frame
> implementation, and is probably too tricky (for me) to get right,(c)
> is probably also tricky, and probably caries significant performance
> overhead (e.g., object duplication during validity checking).
> 
> (a) means that you don't get automatic method inheritance. On the plus
> side, you still get the structure. It is trivial to implement methods
> like [, [[, etc to dispatch on your object and act on the appropriate
> slot. And in some sense you now know what methods i.e., those you've
> implemented, are supported on your object.
> 
> Oleg, here's my cautionary tale for extending list, where manually
> subsetting the .Data slot mixes up the names (callNextMethod would
> have done the right thing, but was not appropriate). This was quite a
> subtle bug for me, because I hadn't been expecting named lists in my
> object; the problem surfaced when sapply used the (incorrectly subset)
> names attribute of the list. My solution in this case was to make sure
> 'names' were removed from lists used to construct objects. As a
> consequence I lose a nice little bit of sapply magic.
> 
> > setClass('A', 'list')
> [1] "A"
> > setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
> +     x at .Data <- x at .Data[i]
> +     x
> + })
> [1] "["
> > names(new('A', list(x=1, y=2))[2])
> [1] "x"
> 
> Martin
> 
> Oleg Sklyar <osklyar at ebi.ac.uk> writes:
> 
> > I had the same problem. Generally data.frame's behave like lists, but
> > while you can extend list, there are problems extending a data.frame
> > class. This comes down to the internal representation of the object I
> > guess. Vectors, including list, contain their information in a (hidden)
> > slot .Data (see the example below). data.frame's do not seem to follow
> > this convention.
> >
> > Any idea how to go around?
> >
> > The following example is exactly the same as Ben's for a data.frame, but
> > using a list. It works fine and one can see that the list structure is
> > stored in .Data
> >
> > * ~: R
> > R version 2.6.1 (2007-11-26) 
> >> setClass("c3",representation(comment="character"),contains="list")
> > [1] "c3"
> >> l = list(1:3,2:4)
> >> z3 = new("c3",l,comment="hello")
> >> z3
> > An object of class “c3”
> > [[1]]
> > [1] 1 2 3
> >
> > [[2]]
> > [1] 2 3 4
> >
> > Slot "comment":
> > [1] "hello"
> >
> >> z3 at .Data
> > [[1]]
> > [1] 1 2 3
> >
> > [[2]]
> > [1] 2 3 4
> >
> > Regards,
> > Oleg
> >
> > On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >> 
> >> I would like to build an S4 class that extends
> >> a data frame, but includes several more slots.
> >> 
> >> Here's an example using integer as the base
> >> class instead:
> >> 
> >> setClass("c1",representation(comment="character"),contains="integer")
> >> z1 = new("c1",55,comment="hello")
> >> z1
> >> z1+10
> >> z1[1]
> >> z1 at comment
> >> 
> >>  -- in other words, it behaves exactly as an integer
> >> for access and operations but happens to have another slot.
> >> 
> >>  If I do this with a data frame instead, it doesn't seem to work
> >> at all.
> >> 
> >> setClass("c2",representation(comment="character"),contains="data.frame")
> >> d = data.frame(1:3,2:4)
> >> z2 = new("c2",d,comment="goodbye")
> >> z2  ## data all gone!!
> >> z2[,1]  ## Error ... object is not subsettable
> >> z2 at comment  ## still there
> >> 
> >>   I can achieve approximately the same effect by
> >> adding attributes, but I was hoping for the structure
> >> of S4 classes ...
> >> 
> >>   Programming with Data and the R Language Definition
> >> contain 2 references each to data frames, and neither of
> >> them has allowed me to figure out this behavior.
> >> 
> >>  (While I'm at it: it would be wonderful to have
> >> a "rich data frame" that could include as a column
> >> any object that had an appropriate length and
> >> [ method ... has anyone done anything in this direction?
> >> ?data.frame says the allowable types are
> >>  "(numeric, logical, factor and character and so on)",
> >>  but I'm having trouble sorting out what the limitations
> >> are ...)
> >> 
> >>   hoping for enlightenment (it would be lovely to be
> >> shown how to make this work, but a definitive statement
> >> that it is impossible would be useful too).
> >> 
> >>   cheers
> >>     Ben Bolker
> >> 
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.6 (GNU/Linux)
> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >> 
> >> iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
> >> eMi+WCEWK6FPpVMpUbo+RBQ=
> >> =huvz
> >> -----END PGP SIGNATURE-----
> >> 
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > -- 
> > Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
-- 
Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466



More information about the R-devel mailing list