[Rd] modifying a persistent (reference) class

Ross Boylan ross at biostat.ucsf.edu
Fri Aug 1 22:47:52 CEST 2014


On Fri, 2014-08-01 at 16:06 -0400, Brian Lee Yung Rowe wrote:
> Ross,
> 
> 
> Ah I didn't think about Smalltalk. Doesn't surprise me that they
> supported upgrades of this sort. That aside I think the question is
> whether it's realistic for a language like R to support such a
> mechanism automatically. Smalltalk and Erlang both have tight
> semantics that would be hard to establish in R (given the multiple
> object systems and dispatching systems). 
> 
> 
> I'm a functional guy so to me it's natural to separate the data from
> the functions/methods. Having spent years writing OOP code I walked
> away concluding that OOP makes things more complicated for the sake of
> being OOP (eg no first class functions). 
In smalltalk everything is an object, and that includes functions,
including class methods.
> Obviously that's changing, and in a language like R it's less of an
> issue. However, something like object serialization smells
> suspiciously similar. If you know that serializing objects is brittle,
> why not look for an alternative approach as opposed to chasing that
> rainbow?
My immediate problem is/was that I have serialized objects representing
weeks of CPU time.  I have to work with them, not some other
representation they might have.  And it's much more natural to work with
R's native persistence than some other scheme I cook up.

I think persistence requires serialization.  The serialization can be
more or less brittle, but I don't think there is an alternative to
serialization.

Since I just worked around my immediate problem a few minutes ago (by
retaining the original class definitions and using setMethod to create
summary methods), my interests are a bit more theoretical.

First, I'd like to understand more about exactly what is saved to disk
for reference and other classes, in particular how much meta-information
they contain.  And my mental model for reference class persistence is
clearly wrong, because in that model instances based on old definitions
come back intact (albeit not with the new method definitions or other
new slots), whereas mine seemed to come back damaged.

Second, I'm still hoping for some elegant way around this problem (how
to redefine classes and still use saved versions from older definitions)
for the future, both with reference and regular classes.  Or at least
some rules about what changes, if any, are safe to make in class
definitions after an instance has been persisted.
> 
Third, if changes to R could make things better, I'm hoping some
developers might take them up.  I realize that is unlikely to happen,
for many good reasons, but I can still hope :)

Ross
> 
> Warm regards,
> Brian
> 
> •••••
> Brian Lee Yung Rowe
> Founder, Zato Novo
> Professor, M.S. Data Analytics, CUNY
> 
> On Aug 1, 2014, at 3:33 PM, Ross Boylan <ross at biostat.ucsf.edu> wrote:
> 
> 
> > On Fri, 2014-08-01 at 14:42 -0400, Brian Lee Yung Rowe wrote:
> > > Ross,
> > > 
> > > 
> > > This is generally a hard problem in software systems. The only
> > > language I know that explicitly addresses it is Erlang. Ultimately
> > > you
> > > need a system upgrade process, which defines how to update the
> > > data in
> > > your system to match a new version of the system. You could do
> > > this by
> > > writing a script that 
> > > 1) loads the old version of your library
> > > 2) loads your data/serialized reference classes
> > > 3) exports data to some intermediate format (eg a list)
> > > 4) loads new version of library
> > > 5) imports data from intermediate format
> > My recollection is that in Gemstone's smalltalk database you can
> > define
> > methods associated with a class that describe how to change an
> > instance
> > from one version to another.  You also have the choice of upgrading
> > all
> > persistent objects at once or doing so lazily, i.e., as they are
> > retrieved.
> > 
> > The brittleness of the representation depends partly on the
> > details.  If
> > a class has 2 slots, a and b, and the only thing on disk is the
> > contents
> > of a and the contents of b, almost any change will screw things up.
> > However, if the slot name is persisted with the instance it's much
> > easier to reconstruct the instance of the class changes (if slot c
> > is
> > added and not on disk, set it to nil; if b is removed, throw it out
> > when
> > reading from disk).  Once could also persist the class definition,
> > or
> > key elements of it, with individual instances referring to the
> > definition.
> > 
> > I don't know which, if any of these strategies, R uses for reference
> > or
> > other classes.
> > > 
> > > 
> > > Once you've gone through the upgrade process, arguably it's better
> > > to
> > > persist the data in a format that is decoupled from your objects
> > > since
> > > then future upgrades would simply be
> > > 1) load new library
> > > 2) import data from intermediate format
> > Arguably :)  As I said, some representations could do this
> > automatically.  And there are still issues such as a change in the
> > type
> > of a slot, or rules for filling new slots, that would require
> > intervention.
> > 
> > In my experience with other object systems, usually methods are
> > attributes of the class.  For R reference classes they appear to be
> > attributes of the instance, potentially modifiable on a per-instance
> > basis.
> > 
> > Ross
> > > 
> > > 
> > > which is no different from day-to-day operation of your app/system
> > > (ie
> > > you're always writing to and reading from the intermediate
> > > format). 
> > > 
> > > 
> > > Warm regards,
> > > Brian
> > > 
> > > •••••
> > > Brian Lee Yung Rowe
> > > Founder, Zato Novo
> > > Professor, M.S. Data Analytics, CUNY
> > > 
> > > On Aug 1, 2014, at 1:54 PM, Ross Boylan <ross at biostat.ucsf.edu>
> > > wrote:
> > > 
> > > 
> > > > I saved objects that were defined using several reference
> > > > classes.
> > > > Later I modified the definition of reference classes a bit,
> > > > creating
> > > > new
> > > > functions and deleting old ones.  The total number of functions
> > > > did
> > > > not
> > > > change.  When I read them back I could only access some of the
> > > > original
> > > > data.
> > > > 
> > > > I asked on the user list and someone suggested sticking with the
> > > > old
> > > > class definitions, creating new classes, reading in the old
> > > > data,
> > > > and
> > > > converting it to the new classes.  This would be awkward (I want
> > > > the
> > > > "new" classes to have the same name as the "old" ones), and I
> > > > can
> > > > probably just leave the old definitions and define the new
> > > > functions
> > > > I
> > > > need outside of the reference classes.
> > > > 
> > > > Are there any better alternatives?
> > > > 
> > > > On reflection, it's a little surprising that changing the code
> > > > for a
> > > > reference class makes any difference to an existing instance,
> > > > since
> > > > all
> > > > the function definitions seem to be attached to the instance.
> > > >  One
> > > > problem I've had in the past was precisely that redefining a
> > > > method
> > > > in a
> > > > reference class did not change the behavior of existing
> > > > instances.
> > > > So
> > > > I've tried to follow the advice to keep the methods
> > > > light-weight.
> > > > 
> > > > In this case I was trying to move from a show method (that just
> > > > printed)
> > > > to a summary method that returned a summary object.  So I wanted
> > > > to
> > > > add
> > > > a summary method and redefine the show to call summary in the
> > > > base
> > > > class, removing all the subclass definitions of show.
> > > > 
> > > > Regular S4 classes are obviously not as sensitive since they
> > > > usually
> > > > don't include the functions that operate on them, but I suppose
> > > > if
> > > > you
> > > > changed the slots you'd be in similar trouble.
> > > > 
> > > > Some systems keep track of versions of class definitions and
> > > > allow
> > > > one
> > > > to write code to migrate old to new forms automatically when the
> > > > data
> > > > are read in.  Does R have anything like that?
> > > > 
> > > > The system on which I encountered the problems was running R
> > > > 2.15.
> > > > 
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > > 
> > 
> > 
> > 



More information about the R-devel mailing list