Next: , Previous: Rules about Persistent Classes, Up: Tutorial


2.7 Persistent collections

The remaining problem outlined in the section on Serialization is that operations which mutate collection types do not have persistent side effects. We have solved this problem for objects, but not for collections such as as arrays, hashes or lists. Elephant provides two solutions to this problem: the pset and btree classes. Each provides persistent addition, deletion and mutation of elements, but the pset is a simple data structure that may be more efficient in memory and time than the more general btree.

2.7.1 Using PSets

The persistent set maintains a persistent, unordered collection of objects. They inherit all the important properties of persistent objects: identity and fast serialization. They also resolve the mutated substructure and nested aggregates problem for collections. Every mutating write to a pset is an independent and persistent operation and you can serialize or deserialize a pset without serializing any of it's key-value pairs.

The pset is also a very convenient data structure for enabling a persistent slot contain a collection that can be updated without deserializing and/or reserializing a list, array or hash table on every access.

Let's explore this data structure through a (very) simple social networking example.

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends :initarg :friends)))

Our goal here is to store a list of friends that each person has, this simple graph structure enables analyses such as who are the friends of my friends, or do I know someone who knows X or what person has the minimum degree of separation from everyone else?

Without psets, we would have to do something like this:

     (defmethod add-friend ((me person) (them person))
       (let ((friends (person-friends me)))
         (pushnew them friends)
         (setf (person-friends me) friends)))
     
     (defmethod remove-friend ((me person) (them person))
       (let ((remaining-friends (delete them (person-friends me))))
         (setf (person-friends me) remaining-friends)))
     
     (defmethod map-friends (fn (me person))
       (mapc fn (person-friends me)))

Ouch! This results in a large amount of consing. We have to deserialize and generate a freshly consed list every time we call person-friends and then reserialize and discard it on every call to (setf person-friends).

Instead, we can simply use a pset as the value of friends and implement the add and remove friend operations as follows:

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends :initarg :friends
                 :initform (make-pset))))
     
     (defmethod add-friend ((me person) (them person))
       (insert-item them (person-friends me)))
     
     (defmethod remove-friend ((me person) (them person))
       (remove-item them (person-friends me)))
     
     (defmethod map-friends (fn (me person))
       (map-pset fn (person-friends me)))

If you want a list to be returned when the user calls person-friends themselves, you can simply rejigger things like this:

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends-set :initarg :friends
                 :initform (make-pset))))
     
     (defmethod person-friends ((me person))
       (pset-list (person-friends-set me)))

If you just change the person-friends calls in our prior functions, the new set of functions removes (setf person-friends), which doesn't make sense for a collection slot, allows users to get a list of the friends for easy list manipulations and avoids all the consing that plagued our earlier version.

You can use a pset in any way you like just like a persistent object. The only difference is the api used to manipulate it. Instead of slot accessors, we use insert, remove, map and find.

There is one drawback to persistent sets and that is that they are not garbage collected. Over time, orphaned sets will eat up alot of disk space. Therefore you need to explicitly free the space or resort to more frequent uses of the migrate procedure to compact your database. The pset supports the drop-pset

However, given that persistent objects have the same explicit storage property, using psets to create collection slots is a nice match.

2.7.2 Using BTrees

BTrees are collections of key-value pairs ordered by key with a log(N) random access time and a rich iteration mechanism. Like persistent sets, they solve all the collection problems of the prior sections. Every key-value pair is stored independently in Elephant just like persistent object slots.

The primary interface to btree objects is through get-value. You use setf get-value to store key-value pairs. This interface is very similar to gethash.

The following example creates a btree called *friends-birthdays* and adds it to the root so we can retrieve it during a later sessions. We then will add two key-value pairs consisting of the name of a friend and a universal time encoding their birthday.

     (defvar *friends-birthdays* (make-btree))
     => *FRIENDS-BIRTHDAYS*
     
     (add-to-root 'friends-birthdays *friends-birthdays*)
     => #<BTREE {4951CF6D}>
     
     (setf (get-value "Ben" *friends-birthdays*)
           (encode-universal-time 0 0 0 14 4 1973))
     => 2312600400
     
     (setf (get-value "Andrew" *friends-birthdays*)
           (encode-universal-time 0 0 0 22 12 1976))
     => 2429071200
     
     (get-value "Andrew" *friends-birthdays*)
     => 2429071200
     => T
     
     (decode-universal-time *)
     => 0
        0
        0
        22
        12
        1976
        2
        NIL
        6

In addition to the hash-table like interface, btree stores pairs sorted by the lisp value of the key, lowest to highest. This is works well for numbers, strings, symbols and persistent objects, but due to serialization semantics may be strange for other values like arrays, lists, standard-objects, etc.

Because elements are sorted by value, we can iterate over all the elements of the BTree in order. Notice that we entered the data in reverse alphabetic order, but will read it out in alphabetical order.

     (map-btree (lambda (k v)
                  (format t "name: ~A utime: ~A~%" k
                    (subseq (multiple-value-list
                              (decode-universal-time v)) 3 6)))
                *friends-birthdays*)
     "Andrew"
     "Ben"
     => NIL

But what if we want to read out our friends from oldest to youngest? One way is to employ another btree that maps birthdays to names, but this requires multiple get-value calls for each update, increasing the burden on the programmer. Elephant provides several better ways to do this.

The next section Indexing Persistent Classes shows you how to order and retrieve persistent classes by one or more slot values.