Next: Indexing Persistent Classes, Previous: Rules about Persistent Classes, Up: Tutorial
The remaining problem outlined in the section on Serialization
is that operations which mutate collection types do not have
persistent side effects. We have solved this problem for objects, but
not for collections such as as arrays, hashes or lists. Elephant
provides two solutions to this problem: the pset
and
btree
classes. Each provides persistent addition, deletion and
mutation of elements, but the pset is a simple data structure that may
be more efficient in memory and time than the more general btree.
The persistent set maintains a persistent, unordered collection of
objects. They inherit all the important properties of persistent
objects: identity and fast serialization. They also resolve the
mutated substructure and nested aggregates problem for collections.
Every mutating write to a pset
is an independent and persistent
operation and you can serialize or deserialize a pset
without
serializing any of it's key-value pairs.
The pset
is also a very convenient data structure for enabling
a persistent slot contain a collection that can be updated without
deserializing and/or reserializing a list, array or hash table on
every access.
Let's explore this data structure through a (very) simple social networking example.
(defpclass person () ((name :accessor person-name :initarg :name)) ((friends :accessor person-friends :initarg :friends)))
Our goal here is to store a list of friends that each person has, this simple graph structure enables analyses such as who are the friends of my friends, or do I know someone who knows X or what person has the minimum degree of separation from everyone else?
Without psets, we would have to do something like this:
(defmethod add-friend ((me person) (them person)) (let ((friends (person-friends me))) (pushnew them friends) (setf (person-friends me) friends))) (defmethod remove-friend ((me person) (them person)) (let ((remaining-friends (delete them (person-friends me)))) (setf (person-friends me) remaining-friends))) (defmethod map-friends (fn (me person)) (mapc fn (person-friends me)))
Ouch! This results in a large amount of consing. We have to
deserialize and generate a freshly consed list every time we call
person-friends
and then reserialize and discard it on every
call to (setf person-friends)
.
Instead, we can simply use a pset
as the value of friends and
implement the add and remove friend operations as follows:
(defpclass person () ((name :accessor person-name :initarg :name)) ((friends :accessor person-friends :initarg :friends :initform (make-pset)))) (defmethod add-friend ((me person) (them person)) (insert-item them (person-friends me))) (defmethod remove-friend ((me person) (them person)) (remove-item them (person-friends me))) (defmethod map-friends (fn (me person)) (map-pset fn (person-friends me)))
If you want a list to be returned when the user calls person-friends themselves, you can simply rejigger things like this:
(defpclass person () ((name :accessor person-name :initarg :name)) ((friends :accessor person-friends-set :initarg :friends :initform (make-pset)))) (defmethod person-friends ((me person)) (pset-list (person-friends-set me)))
If you just change the person-friends calls in our prior functions,
the new set of functions removes (setf person-friends)
, which
doesn't make sense for a collection slot, allows users to get a list
of the friends for easy list manipulations and avoids all the consing
that plagued our earlier version.
You can use a pset
in any way you like just like a persistent
object. The only difference is the api used to manipulate it.
Instead of slot accessors, we use insert, remove, map and find.
There is one drawback to persistent sets and that is that they are not
garbage collected. Over time, orphaned sets will eat up alot of disk
space. Therefore you need to explicitly free the space or resort to
more frequent uses of the migrate procedure to compact your database.
The pset supports the drop-pset
However, given that persistent objects have the same explicit storage property, using psets to create collection slots is a nice match.
BTrees are collections of key-value pairs ordered by key with a log(N) random access time and a rich iteration mechanism. Like persistent sets, they solve all the collection problems of the prior sections. Every key-value pair is stored independently in Elephant just like persistent object slots.
The primary interface to btree
objects is through
get-value
. You use setf
get-value
to store
key-value pairs. This interface is very similar to gethash
.
The following example creates a btree called
*friends-birthdays*
and adds it to the root so we can retrieve
it during a later sessions. We then will add two key-value pairs
consisting of the name of a friend and a universal time encoding their
birthday.
(defvar *friends-birthdays* (make-btree)) => *FRIENDS-BIRTHDAYS* (add-to-root 'friends-birthdays *friends-birthdays*) => #<BTREE {4951CF6D}> (setf (get-value "Ben" *friends-birthdays*) (encode-universal-time 0 0 0 14 4 1973)) => 2312600400 (setf (get-value "Andrew" *friends-birthdays*) (encode-universal-time 0 0 0 22 12 1976)) => 2429071200 (get-value "Andrew" *friends-birthdays*) => 2429071200 => T (decode-universal-time *) => 0 0 0 22 12 1976 2 NIL 6
In addition to the hash-table like interface, btree
stores
pairs sorted by the lisp value of the key, lowest to highest. This is
works well for numbers, strings, symbols and persistent objects, but
due to serialization semantics may be strange for other values like
arrays, lists, standard-objects, etc.
Because elements are sorted by value, we can iterate over all the elements of the BTree in order. Notice that we entered the data in reverse alphabetic order, but will read it out in alphabetical order.
(map-btree (lambda (k v) (format t "name: ~A utime: ~A~%" k (subseq (multiple-value-list (decode-universal-time v)) 3 6))) *friends-birthdays*) "Andrew" "Ben" => NIL
But what if we want to read out our friends from oldest to youngest?
One way is to employ another btree that maps birthdays to names, but
this requires multiple get-value
calls for each update,
increasing the burden on the programmer. Elephant provides several
better ways to do this.
The next section Indexing Persistent Classes shows you how to order and retrieve persistent classes by one or more slot values.