Persistent Classes and Objects - Elephant User Manual

Next: Class Indices, Previous: Serialization details, Up: User Guide

4.3 Persistent Classes and Objects

Persistent classes are instances of the persistent-metaclass metaclass. All persistent classes keep track of which slots are :persistent, :transient and/or :indexed and are used as specializers in the persistence meta-object protocols (initialization of slots, slot-access, etc).

All persistent classes create objects that inherit from the persistent class. The persistent class provides two slots that contain a unique object identifier (oid) and a reference to the store-controller specification they are associated with. Persistent slots do not take up any storage space in memory, instead the persistent-metaclass slot access protocol redirects slot accesses into calls to the store controller. Typically, the underlying data store will then perform the necessary serialization, deserialization to read and write data to disk.

When a reference to a persistent instance itself is written to the database, for example as a key or value in a btree, only the unique ID and class of the instance is stored. When read, a persistent object instance is re-created (see below). This means that serialization of persistent objects is exceedingly cheap compared to standard objects. The subsection on instance creation below will discuss the lifecycle of a persistent object in more detail.

4.3.1 Persistent Class Definition

To create persistent classes, the user needs to specify the persistent-metaclass to the class initarg :metaclass.

     (defclass my-pclass ()
        ((slot1 :accessor slot1 :initarg :slot1 :initform 1))
        (:metaclass persistent-metaclass))

The only differences between the syntax of standard and persistent class definitions is the ability to specify a slot storage policy and an index policy. Slot value storage policies are specified by a boolean argument to the slot initargs :persistent, :transient and :indexed. Slots are :persistent and not :indexed by default.

The defpclass macro is provided as a convenience to hide the :metaclass slot option.

     (defpclass my-pclass ()
        ((pslot1 :accessor pslot1 :initarg :pslot1 :initform 'one)
         (pslot2 :accessor pslot2 :initarg :pslot2 :initform 'two
                 :persistent t)
         (tslot1 :accessor tslot1 :initarg :tslot1 :initform 'three
                 :transient t)))

In the definition above the class my-pclass is an instance of the metaclass persistent-metaclass. According to this definition pslot1 and pslot2 are persistent while tslot1 is transient and stored in memory.

Slot storage class implications are straightforward. Persistent slot writes are durably stored to disk and reads are made from disk and can be part of a ACID compliant transaction . Transient slots are initialized on instance creation according to initforms or initargs. Transient slot values are never stored to nor loaded from the database and their accesses cannot be protected by transactions. (Ordinary multi-process synchronization would be required instead).

The :index option tells Elephant whether to maintain an inverted index that maps slot values to their parent objects. The behavior of indexed classes and class slots are discussed in depth in Class Indices.

Persistent classes have their metaobject protocols modified through specializations on persistent-metaclass. These specializations include the creation of special slot metaobjects: transient-slot-definition, persistent-slot-definition and direct and effective versions of each. For the MOP aficionado the highlights of the new class initialization protocols are as follows:

shared-initialize :around ensures that this class inherits from persistent-object and persistent if it doesn't already and that the class option :index results in class indexes being indexed;.
direct-slot-initialization-class returns the appropriate slot metaobject based on the values of the :transient and :persistent slot definition keywords. It also does some simple error checking for invalid combinations, for example, indexed transient slots.
effective-slot-definition-class performs the same role as the above for effective slots.
slot-definition-allocation returns the :database allocation for persistent slot definitions so the underlying lisp will not allocate instance or class storage under some lisps.
compute-effective-slot-definition-initargs performs some error checking to ensure a subclass does not try to make an inherited persistent slot transient.
finalize-inheritance called before the first instance is created in order to finalize the list of persistent slots to account for any forward referenced classes in the inheritence list. Similarly the list of indexed slots is computed. This function is also called by the class indexing code if any calls are made that depend on knowing which slots are indexed.

Reinitialization is discussed in the section on class redefinition.

4.3.2 Instance Creation

Persistent objects are created just like standard objects, with a call to make-instance. Initforms and slot initargs behave as the user expects. The call to make-instance of a persistent class will fail unless there is a default store-controller instance in the variable *store-controller* or the :sc keyword argument is provided a valid store controller object. The store controller is required to provide a unique object id, initialize the specification pointer of the instance and to store the values of any initialized slots. The initialization process is as follows:

initialize-instance :before is called to initialize the oid slot and the data store specification slot dbcn-spc-pst. The oid is set by the argument :from-oid or by calling the store controller for a new oid.
shared-initialize :around is called to ensure that the underlying lisp does not bypass the metaobject protocol during slot initialization by manually initializing the persistent slots and passing the transient slots to the underlying lisp. Finally it adds the instance to the class index so that any inverted indicies are updated appropriately.

Persistent slots are initialized only under the following conditions:

An initarg is provided to make-instance
The database slot value is unbound, an initform exists and from-oid was not specified

After initialization the persistent instance is added to its host store controller's object cache. This cache is a weak hash table that maps oids to object instances. So after initialization the following state has been created:

Placeholder Instance: An instance of the class is in memory, containing storage for the oid, the specification reference, lisp instance data and any transient slot values. We call this the placeholder instance which mediates access to persistent values, but does not itself persist.
Cached Reference: A weak reference to the instance is in the store controller object cache
Memory References: A normal reference to the instance is (maybe) retained by the caller of make-instance.
Database Slot Values: The data store contains the persistent slot values that were initialized, indexed by the object id and slot name.
Database References: If the resulting placeholder instance was written to a persistent slot, added to a btree or the class is indexed, a reference to the instance was written into the data store. Today this reference consists of an oid and a class name. If this reference is reachable, then the persistent object can be reconstructed using the :from-oid argument.

If you mnanually create an object using an OID which already exists in the database, initargs to make-instance take precedence over existing values in the database, which in turn take precedence over any initforms defined in the class.

4.3.3 Persistent Instance Lifecycle

The distributed nature of persistent instance storage results in some interesting behaviors, especially with respect to transient slots. The prior section detailed the state of the system after the original initialization of an object. The object can then be in a number of different states:

Resident: The canonical state of an in-use persistent object as described in the initialization section above.
Unreferenced, Unreclaimed: All memory references to the object have been dropped but the placeholder instance has not yet been garbage collected. The weak pointer still exists in the cache. If a database reference is fetched from the data store, the cached value will be used.
Non-resident: The object only exists as reachable database references and slot values. This is the state after garbage collection of the placeholder instance.
Recreated: An intermediary state where a non resident object is fetched from the data store and its placeholder object must be recreated prior to the object enter the resident state.

The garbage collection of the placeholder instance is an important feature. This means that we can have more objects in our system than are currently resident in memory. If this were not the case, what would be the point of an object database?

The recreated state deserves to be discussed in more detail. We learned earlier that the database reference contains the oid and class of the object, and of course we know the store-controller the reference is stored into¹, so this information is sufficient to reconstruct the placeholder instance.

When the reference is deserialized, its oid is used to look up the object in the store controller's object cache. If this fails, then the instance is created with a call much like this:

     (make-instance 'pclass :from-oid 2000 :sc *store-controller*)

The :from-oid argument to make-instance overrides some of the normal make-instance behavior by inhibiting all initform initialization as the object's slots are assumed to be properly initialized from the original call to make-instance.

4.3.4 Using Transient Slots

What about transient slots? Transients slots are tied to the placeholder object where their storage is allocated. While the persistent slots are permanently stored in the data store, transient slots can be garbage collected when all memory references have been dropped, even if database references exist.

After collection, if you retrieve an object from the store, its transient slots will be reset to the slot initforms from the class definition. You can only reliably use :initargs to initialize transient or persistent slots during the initial call to make-instance or when manually creating the instance from an oid.

Here is an example illustrating the ephemeral nature of transient slots:

     (setf pobj1 (make-instance 'my-pclass :pslot1 1 :tslot3 3))
     => #<MY-PCLASS>
     
     (pslot1 pobj1) => 1
     (pslot2 pobj1) => 'two
     (tslot1 pobj1) => 3
     
     (add-to-root 'pobj1 pobj1)
     
     (setf pobj2 (get-from-root 'pobj1))
     => #<MY-PCLASS>
     
     (pslot1 pobj2) => 1
     (pslot2 pobj2) => 'two
     (tslot1 pobj2) => 3
     
     (setf pobj1 nil)
     (setf pobj2 nil)
     (gc)
     
     (setf pobj3 (get-from-root 'pobj1))
     (pslot1 pobj2) => 1
     (pslot2 pobj2) => 'two
     (tslot1 pobj2) => 'three

The implications of this behavior is that you need to think carefully about how to use employ transient values. Essentially you cannot make assumptions about the state of transient values in objects loaded from the store unless you know that they were loaded at some point in time and cannot be GC'ed (i.e. they are stored in a list or hash table).

A good policy is to initialize transient values using an :after method on initialize-instance. This allows you to initialize transient values using either system defaults or persistent slot values. That way you can ensure that the transient slots are always in a consistent state when accessed by the application, regardless of when the placeholder object was recreated.

In general, transient slots are a good place for intermediate values in a computation or to cache frequently read items to avoid deserialization overhead. indexed-btree is an example of this approach, an in-memory hash is cached in the transient slot for reads and writes are mirrored to a serialized hash in a persistent slot. The :after method just copies the persistent hash value to the transient slot.

4.3.5 Using Persistent Slots

Persistent slot use is straightforward. You can read from them, write to them or make them unbound. Remember that every access goes to the data store. This makes reads relatively expensive as they may result in a disk seek. Writes can be doubly expensive, especially outside a transaction, as the write will result in a synchronous disk synch operation.

Reads and writes require the home store controller to be valid and open. The placeholder object's specification pointer is used to retrieve the store-controller object. If this object is closed or mising, the system will give you a restart option to reopen the controller and continue.

Persistent slot behavior is implemented by overloading the relevant MOP functions controlling slot access:

slot-value-using-class
(setf slot-value-using-class)
slot-boundp-using-class
slot-makunbound-using-class

Each of these functions retrieves the home store-controller for the instance and then calls a method specialized on the class of that store controller. This method is responsible for mapping the oid and slotname of the slot access to the appropriate value in the data store.

4.3.6 Class Redefinition

Class redefinition is problematic in the current (0.9) version of Elephant. The usual CLOS mechanisms are properly implemented, but updating instances will only work for those instances that are in memory at the time. Instances that are non-resident will not be updated. This is usually not as big a problem as it seems, because the slot values are stored independently. An outline of the update procedure follows:

The function update-instance-for-redefined-class is called by CLOS whenever defclass is re-evaluated and results in a change in the list of slots.

For transient slots the behavior is the same as it is in CLOS for all in-memory slots.

Added slots: are added to the object and their initforms called just as if they were created without initargs
Discarded slots: are dropped and their values lost

Persistent slots have a slightly different behavior, as only resident (those with valid placeholder objects) objects are updated.

Added slots (resident): are added to the object and the initforms are called only on in-memory objects, as in an empty call to make-instance
Added slots (non-resident): the added slots will have unbound values
Discarded slots (resident): slots are dropped from the class and become inaccessible, but their values are not deleted from the database. This is a precautionary measure as losing persistent data because of an accidental re-evaluation while editing a defclass could be painful. If you add the slot back, the original value will be accessible regardless of the initform.
Discarded slots (non-resident): This has the same behavior as resident objects, as no side effects are made on the objects or their slots

There are additional considerations for matching class indexing options in the class object to the actual indices in the database. The following section will discuss synchronizing these if they diverge.

(Note: release 0.9.1 should fix this by providing an oid->class map that allows the system to cheaply iterate over all objects and update them appropriately. This hasn't been done yet due to performance implications. See Trac system for the appropriate tickets)

4.3.7 Support for `change-class`

Elephant also supports the change-class by overloading update-instance-for-different-class. The handling of slots in this case is identical to the class redefinition above. Persistent and transient slot values are retained if their name matches a slotname in the new class and initforms are called on newly added slots. Valid initargs for any slot will override this default behavior and set the slot value to the initarg value.

Because the instance is guaranteed to be resident, the operation has none of the resident/non-resident conflicts above.

Change class cannot convert between persistent and non-persistent classes and will flag an error if you try to do so. (Note: this could be implemented in the future if users request it)

Footnotes

[1] If you attempt to store an object from one store into another, the system will issue an error condition called cross-reference-error