Elephant User Manual


Next: , Up: (dir)

Copyright

Elephant System
Original Version, Copyright © 2004 Ben Lee and Andrew Blumberg.
Version 0.5, Copyright © 2006 Robert L. Read.
Versions 0.6-0.9, Copyright © 2006-2007 Ian Eslick and Robert L. Read
Portions copyright respective contributors (see CREDITS).

Elephant Manual
Original Version, Copyright © 2004 Ben Lee.
Versions 0.5-0.6, Copyright © 2006 Robert L. Read.
Current Version, Copyright © 2006-2007 Ian Eslick and Robert L. Read

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License. See the Copyright and License chapter for details about copyright, license and warranty for this manual and the Elephant system.

Chapters

Appendices


Next: , Previous: Top, Up: Top

Table of Contents


Next: , Previous: Table of Contents, Up: Top

1 Introduction

Elephant is a persistent object protocol and database for Common Lisp. The persistent protocol component of elephant overrides class creation and standard slot accesses using the Meta-Object Protocol (MOP) to render slot values persistent. Database functionality includes the ability to persistently index and retrieve ordered sets of class instances and ordinary lisp values. Elephant has an extensive test suite and the core functionality is becoming quite mature.

The Elephant code base is available under the LLGPL license. Data stores each come with their own, separate license and you will have to evaluate the implications of using them yourself.

1.1 History

Elephant was originally envisioned as a lightweight interface layer on top of the Berkeley DB library, a widely-distributed embedded database that many unix systems have installed by default. Berkeley DB is ACID compliant, transactional, process and thread safe, and fast relative to relational databases.

Elephant has been extended to provide support for multiple backends, specifically a relational database backend based on CL-SQL which has been tested with Postgres and SQLite 3, and probably support other relational systems easily. It supports, with some care, multi-repository operation and enables convenient migration of data between repositories.

The support for relational backends and migration to the LLGPL was to allow for broader use of Elephant in both not-for-profit and commercial settings. Several additional backends are planned for future releases including a native Lisp implementation released under the LLGPL.

Elephant's current development focus is to enhance the feature set including a native lisp backend, a simple query language, and flexible persistence models that selectively break one or more of the ACID constraints to improve performance.

1.2 Elephant Goals

1.3 More Information

Join the Elephant mailing lists to ask your questions and receive updates. You can also review archives for past discussions and questions. Pointers can be found on the Elephant website at

http://www.common-lisp.net/project/elephant.

Installation instructions can be found in the Installation section. Bugs can be reported via the Elephant Trac system at

http://trac.common-lisp.net/elephant/.

This also serves as a good starting point for finding out what new features or capabilities you can contribute to Elephant. The Trac system also contains a wiki with design discussions and a FAQ.


Next: , Previous: Introduction, Up: Top

2 Tutorial


Next: , Up: Tutorial

2.1 Overview

Elephant is a Persistence Metaprotocol and Database for Common Lisp. It provides the ability for users to define and interact with persistent objects and to transparently store ordinary lisp values. Persistent objects are CLOS instances that overload the ordinary slot access semantics so that every write to a slot is passed through and written to disk. Non-persistent lisp objects and values can be written to slots and will be automatically persisted. In addition, Elephant provides a persistent index which maintains an ordered collection of lisp values or persistent object references.

The use of persistent objects makes coding concise, convenient, and powerful, and makes persistence almost invisible to the programmer. However, Elephant also allows the same basic data dictionary of key/value retrieval that BerkeleyDB provides.

When someone says "database," most people think of SQL Relational Data Base Management Systems (e.g. Oracle, Postgresql, MySql). Those systems store data in statically typed tables with unique shared values to connect rows in separate tables. Objects can be mapped into these tables in an object-relational mapping that assigns objects to rows and slot values to columns in a row's table. If a slot references another type of object, a unique ID can be used to reference that object's table. CL-SQL, for example, provides facilities for this kind of object-relational mapping and there are many systems for other languages that do the same (i.e. Hibernate for Java).

While Elephant can use either RDBMSs or Berkeley DB as a data store, the model it supports is that of objects stored in persistent indices. Unlike systems such as Hibernate for Java, the user does not need to construct or worry about a mapping from the object space into the database. Elephant relies on LISP rather than SQL for its data manipulation language. Elephant is designed to be a simple and convenient tool for the programmer.

Elephant consists of a small universe of basic concepts:

There are a set of more advanced concepts you will learn about later, but these basic concepts will serve to acquaint you with Elephant.

If you do not already have Elephant installed and building correctly, read the Installation section of this manual and then move on to Getting Started.


Next: , Previous: Overview, Up: Tutorial

2.2 Getting Started

The first step in using elephant is to open a store controller. A store controller is an object that coordinates lisp program access to the chosen data store.

To obtain a store controller, you call open-store with a store specification. A store specification is a list containing a backend specifier (:BDB or :CLSQL) and a backend-specific reference.

For :BDB, the second element is a string or pathname that references a local directory for the database files. This directory must be created prior to calling open-store.

     (open-store '(:BDB ``/users/me/db/my-db/''))

For :CLSQL the second argument is another list consisting of a specific SQL database and the name of a database file or connection record to the SQL server. Examples are:

     (open-store '(:CLSQL (:SQLITE "/users/me/db/sqlite.db")))
     (open-store '(:CLSQL (:POSTGRESQL "localhost.localdomain"
                                       "mydb" "myuser" ""))))

We use Berkeley DB as our example backend. To open a BDB store-controller we can do the following:

     (asdf:operate 'asdf:load-op :elephant)
     (use-package :elephant)
     (setf *test-db-spec*
           '(:BDB "/home/me/db/testdb/"))
     (open-store *test-db-spec*)

We do not need to store the reference to the store just now as it is automatically assigned to the variable, *store-controller*. For a deeper discussion of store controller management see the User Guide.

When you're done with your session, release the store-controller's resources by calling close-store.

Also there is a convenience macro with-open-store that will open and close the store, but opening the store is an expensive operation so it is generally better to leave the store open until your application no longer needs it.


Next: , Previous: Getting Started, Up: Tutorial

2.3 The Store Root

What values live between lisp sessions is called liveness. Liveness in a store is determined by whether the value can be reached from the root of the store. The root is a special BTree in which other BTrees and lisp values can be stored. This BTree has a special interface through the store controller. (There is a second root BTree called the class root which will be discussed later.)

You can put something into the root object by

     (add-to-root "my key" "my value")
     => "my value"

and get things out via

     (get-from-root "my key")
     => "my value"
     => T

The second value indicates whether the key was found. This is important if your key-value pair can have nil as a value.

You can perform other basic operations as well.

     (root-existsp "my key")
     => T
     (remove-from-root "my key")
     => T
     (get-from-root "my key")
     => NIL
     => NIL

To access all the objects in the root, the simplest way is to simply call map-root with a function to apply to each key-value pair.

     (map-root
       (lambda (k v)
          (format t "key: ~A value:~A~%" k v)))

You can also access the root object directly.

     (controller-root *store-controller*)
     => #<DB-BDB::BDB-BTREE  #x10e86042>

It is an instance of a class "btree"; see Persistent BTrees.


Next: , Previous: The Store Root, Up: Tutorial

2.4 Serialization

What can you put into the store besides strings? Almost all lisp values and objects can be stored: numbers, symbols, strings, nil, characters, pathnames, conses, hash-tables, arrays, CLOS objects and structs. Nested and circular things are allowed. Nested and circular things are allowed. You can store basically anything except compiled functions, closures, class objects, packages and streams. Functions can be stored as uncompiled lambda expressions. (Compiled functions and other kinds of objects may eventually get supported too.)

Elephant needs to use a representation of data that is independant of a specific lisp or data store. Therefore all lisp values that are stored must be serialized into a canonical format. Because Berkeley DB supports variable length binary buffers, Elephant uses a binary serialization system. This process has some important consequences that it is very important to understand:

  1. Lisp identity can't be preserved. Since this is a store which persists across invocations of Lisp, this probably doesn't even make sense. However if you get an object from the index, store it to a lisp variable, then get it again - they will not be eq:
              (setq foo (cons nil nil))
              => (NIL)
              (add-to-root "my key" foo)
              => (NIL)
              (add-to-root "my other key" foo)
              => (NIL)
              (eq (get-from-root "my key")
                    (get-from-root "my other key"))
              => NIL
         
  2. Nested aggregates are stored in one buffer. If you store an set of objects in a hash table you try to store a hash table, all of those objects will get stored in one large binary buffer with the hash keys. This is true for all other aggregates that can store type T (cons, array, standard object, etc).
  3. Mutated substructure does not persist.
              (setf (car foo) T)
              => T
              (get-from-root "my key")
              => (NIL)
         

    This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use BTrees (see Persistent BTrees).

  4. Serialization and deserialization can be costly. While serialization is pretty fast, but it is still expensive to store large objects wholesale. Also, since object identity is impossible to maintain, deserialization must re-cons or re-allocate the entire object every time increasing the number of GCs the system does. This eager allocation is contrary to how most people want to use a database: one of the reasons to use a database is if your objects can't fit into main memory all at once.
  5. Merge-conflicts in heavily multi-process/threaded situations. This is the common read-modify-write problem in all databases. We will talk more about this in the Using Transactions section.

This may seem terribly restrictive, but don't despair, we'll solve most of these problems in the next section.....


Next: , Previous: Serialization, Up: Tutorial

2.5 Persistent Classes

The Common Lisp Object System and the Metaobject Protocol, gives us the tools to solve these problems for objects:

     (defclass my-persistent-class ()
       ((slot1 :accessor slot1)
        (slot2 :accessor slot2))
       (:metaclass persistent-metaclass))
     
     (setq foo (make-instance 'my-persistent-class))
     => #<MY-PERSISTENT-CLASS {492F4F85}>
     
     (add-to-root "foo" foo)
     => NIL
     (add-to-root "bar" foo)
     => NIL
     (eq (get-from-root "foo")
         (get-from-root "bar"))
     => T

What's going on here? Persistent classes, that is, classes which use the persistent-metaclass metaclass, are given unique IDs (accessable through ele::oid). They are serialized simply by their OID and class. Slot values are stored separately (and invisible to the user) keyed by OID and slot. Loading (deserializing) a persistent class

     (get-from-root "foo")
     => #<MY-PERSISTENT-CLASS {492F4F85}>

instantiates the object or finds it in a memory cache if it already exists. (The cache is a weak hash-table, so gets flushed on GCs if no other references to the persistent object are kept in memory). The slot values are NOT loaded until you ask for them. In fact, the persisted slots don't have space allocated for them in the instances, because we're reading from the database.

     (setf (slot1 foo) "one")
     => "one"
     (setf (slot2 foo) "two")
     => "two"
     (slot1 foo)
     => "one"
     (slot2 foo)
     => "two"

Changes made to them propogate automatically:

     (setf (slot1 foo) "three")
     => "three"
     (slot1 (get-from-root "bar"))
     => "three"

You can also create persistent classes using the convenience macro defpclass.

     (defpclass my-persistent-class ()
       ((slot1 :accessor slot1)
        (slot2 :accessor slot2)))

Although it is hard to see here, serialization / deserialization of persistent classes is fast, much faster than ordinary CLOS objects. Finally, they do not suffer from merge-conflicts when accessed within a transaction (see below). In short: persistent classes solve the problems associated with storing ordinary CLOS objects. We'll see later that BTrees solve the problems associated with storing hash-tables.


Next: , Previous: Persistent Classes, Up: Tutorial

2.6 Rules about Persistent Classes

Using the persistent-metaclass metaclass declares all slots to be persistent by default. To make a non-persistent slot use the :transient t flag. Class slots :allocation :class are never persisted, for either persistent or ordinary classes. (Someday, if we choose to store class objects, this policy may change).

Persistent classes may inherit from other classes. Slots inherited from persistent classes remain persistent. Transient slots and slots inherited from ordinary classes remain transient. Ordinary classes cannot inherit from persistent classes – otherwise persistent slots could not be stored!

     (defclass stdclass1 ()
       ((slot1 :initarg :slot1 :accessor slot1)))
     
     (defclass stdclass2 (stdclass1)
       ((slot2 :initarg :slot2 :accessor slot2)))
     
     (defpclass pclass1 (stdclass2)
       ((slot1 :initarg :slot1 :accessor slot1)
        (slot3 :initarg :slot3 :accessor slot3)))
     
     (make-instance 'pclass1 :slot1 1 :slot2 2 :slot3 3)
     => #<PCLASS1 {x10deb88a}>
     
     (add-to-root 'pinst *)
     => #<PCLASS1 {x10deb88a}>
     
     (slot1 pinst)
     => 1
     
     (slot2 pinst)
     => 2
     
     (slot3 pinst)
     => 3

Now we can simulate a new lisp session by flushing the instance cache, reloading our object then see what slots remain. Here persistent slot1 should shadow the standard slot1 and thus be persistent. Slot3 is persistent by default and slot2, since it is inherited from a standard class should be transient.

     (elephant::flush-instance-cache *store-controller*)
     => #<EQL hash-table with weak values, 0 entries {x11198a02}>
     
     (setf pinst (get-from-root 'pinst))
     => #<PCLASS1 {x1119b652}>
     
     (slot1 pinst)
     => 1
     
     (slot-boundp pinst slot2 pinst)
     => nil
     
     (slot3 pinst)
     => 3

Using persistent objects has implications for the performance of your system. Note that the database is read every time you access a slot. This is a feature, not a bug, especially in concurrent situations: you want the most recent commits by other threads, right? This can be used as a weak form of IPC. But also note that in particular, if your slot value is not an immediate value or persistent object, reading will cons or freshly allocate storage for the value.

Gets are not an expensive operation; you can perform thousands to tens of thousands of primitive reads per second. However, if you're concerned, cache large values in memory and avoid writing them back to disk as long as you can.


Next: , Previous: Rules about Persistent Classes, Up: Tutorial

2.7 Persistent collections

The remaining problem outlined in the section on Serialization is that operations which mutate collection types do not have persistent side effects. We have solved this problem for objects, but not for collections such as as arrays, hashes or lists. Elephant provides two solutions to this problem: the pset and btree classes. Each provides persistent addition, deletion and mutation of elements, but the pset is a simple data structure that may be more efficient in memory and time than the more general btree.

2.7.1 Using PSets

The persistent set maintains a persistent, unordered collection of objects. They inherit all the important properties of persistent objects: identity and fast serialization. They also resolve the mutated substructure and nested aggregates problem for collections. Every mutating write to a pset is an independent and persistent operation and you can serialize or deserialize a pset without serializing any of it's key-value pairs.

The pset is also a very convenient data structure for enabling a persistent slot contain a collection that can be updated without deserializing and/or reserializing a list, array or hash table on every access.

Let's explore this data structure through a (very) simple social networking example.

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends :initarg :friends)))

Our goal here is to store a list of friends that each person has, this simple graph structure enables analyses such as who are the friends of my friends, or do I know someone who knows X or what person has the minimum degree of separation from everyone else?

Without psets, we would have to do something like this:

     (defmethod add-friend ((me person) (them person))
       (let ((friends (person-friends me)))
         (pushnew them friends)
         (setf (person-friends me) friends)))
     
     (defmethod remove-friend ((me person) (them person))
       (let ((remaining-friends (delete them (person-friends me))))
         (setf (person-friends me) remaining-friends)))
     
     (defmethod map-friends (fn (me person))
       (mapc fn (person-friends me)))

Ouch! This results in a large amount of consing. We have to deserialize and generate a freshly consed list every time we call person-friends and then reserialize and discard it on every call to (setf person-friends).

Instead, we can simply use a pset as the value of friends and implement the add and remove friend operations as follows:

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends :initarg :friends
                 :initform (make-pset))))
     
     (defmethod add-friend ((me person) (them person))
       (insert-item them (person-friends me)))
     
     (defmethod remove-friend ((me person) (them person))
       (remove-item them (person-friends me)))
     
     (defmethod map-friends (fn (me person))
       (map-pset fn (person-friends me)))

If you want a list to be returned when the user calls person-friends themselves, you can simply rejigger things like this:

     (defpclass person ()
       ((name :accessor person-name :initarg :name))
       ((friends :accessor person-friends-set :initarg :friends
                 :initform (make-pset))))
     
     (defmethod person-friends ((me person))
       (pset-list (person-friends-set me)))

If you just change the person-friends calls in our prior functions, the new set of functions removes (setf person-friends), which doesn't make sense for a collection slot, allows users to get a list of the friends for easy list manipulations and avoids all the consing that plagued our earlier version.

You can use a pset in any way you like just like a persistent object. The only difference is the api used to manipulate it. Instead of slot accessors, we use insert, remove, map and find.

There is one drawback to persistent sets and that is that they are not garbage collected. Over time, orphaned sets will eat up alot of disk space. Therefore you need to explicitly free the space or resort to more frequent uses of the migrate procedure to compact your database. The pset supports the drop-pset

However, given that persistent objects have the same explicit storage property, using psets to create collection slots is a nice match.

2.7.2 Using BTrees

BTrees are collections of key-value pairs ordered by key with a log(N) random access time and a rich iteration mechanism. Like persistent sets, they solve all the collection problems of the prior sections. Every key-value pair is stored independently in Elephant just like persistent object slots.

The primary interface to btree objects is through get-value. You use setf get-value to store key-value pairs. This interface is very similar to gethash.

The following example creates a btree called *friends-birthdays* and adds it to the root so we can retrieve it during a later sessions. We then will add two key-value pairs consisting of the name of a friend and a universal time encoding their birthday.

     (defvar *friends-birthdays* (make-btree))
     => *FRIENDS-BIRTHDAYS*
     
     (add-to-root 'friends-birthdays *friends-birthdays*)
     => #<BTREE {4951CF6D}>
     
     (setf (get-value "Ben" *friends-birthdays*)
           (encode-universal-time 0 0 0 14 4 1973))
     => 2312600400
     
     (setf (get-value "Andrew" *friends-birthdays*)
           (encode-universal-time 0 0 0 22 12 1976))
     => 2429071200
     
     (get-value "Andrew" *friends-birthdays*)
     => 2429071200
     => T
     
     (decode-universal-time *)
     => 0
        0
        0
        22
        12
        1976
        2
        NIL
        6

In addition to the hash-table like interface, btree stores pairs sorted by the lisp value of the key, lowest to highest. This is works well for numbers, strings, symbols and persistent objects, but due to serialization semantics may be strange for other values like arrays, lists, standard-objects, etc.

Because elements are sorted by value, we can iterate over all the elements of the BTree in order. Notice that we entered the data in reverse alphabetic order, but will read it out in alphabetical order.

     (map-btree (lambda (k v)
                  (format t "name: ~A utime: ~A~%" k
                    (subseq (multiple-value-list
                              (decode-universal-time v)) 3 6)))
                *friends-birthdays*)
     "Andrew"
     "Ben"
     => NIL

But what if we want to read out our friends from oldest to youngest? One way is to employ another btree that maps birthdays to names, but this requires multiple get-value calls for each update, increasing the burden on the programmer. Elephant provides several better ways to do this.

The next section Indexing Persistent Classes shows you how to order and retrieve persistent classes by one or more slot values.


Next: , Previous: Persistent collections, Up: Tutorial

2.8 Indexing Persistent Classes

Class indexing simplifies the storing and retrieval of persistent objects. An indexed class stores every instance of the class that is created, ensuring that every object is automatically persisted between sessions.

     (defpclass friend ()
       ((name :accessor name :initarg :name)
        (birthday :initarg :birthday))
       (:index t))
     => #<PERSISTENT-METACLASS FRIEND>
     
     (defmethod print-object ((f friend) stream)
       (format stream "#<~A>" (name f)))
     
     (defun encode-date (dmy)
       (apply #'encode-universal-time
         (append '(0 0 0) dmy)))
     
     (defmethod (setf birthday) (dmy (f friend))
       (setf (slot-value f 'birthday)
             (encode-date dmy))
       dmy)
     
     (defun decode-date (utime)
       (subseq (multiple-value-list (decode-universal-time utime)) 3 6))
     
     (defmethod birthday ((f friend))
       (decode-date (slot-value f 'birthday)))

Notice the class argument “:index t”. This tells Elephant to store a reference to this class. Under the covers, there are a set of btrees that keep track of classes, but we won't need to worry about that as all the functionality has been nicely packaged for you.

We also created our own birthday accessor for convenience so it accepts and returns birthdays in a list consisting of month, day and year such as (27 3 1972). The index key will be the encoded universal time, however.

Now we can easily manipulate all the instances of a class.

     (defun print-friend (friend)
       (format t " name: ~A birthdate: ~A~%"
               (name friend) (birthday friend)))
     
     (make-instance 'friend :name "Carlos"
                            :birthday (encode-date '(1 1 1972)))
     (make-instance 'friend :name "Adriana"
                            :birthday (encode-date '(24 4 1980)))
     (make-instance 'friend :name "Zaid"
                            :birthday (encode-date '(14 8 1976)))
     
     (get-instances-by-class 'friends)
     => (#<Carlos> #<Adriana> #<Zaid>)
     
     (mapcar #'print-friend *)
      name: Carlos birthdate: (1 1 1972)
      name: Adriana birthdate: (24 4 1980)
      name: Zaid birthdate: (14 8 1976)
     => (#<Carlos> #<Adriana> #<Zaid>)

But what if we have thousands of friends? Aside from never getting work done, our get-instances-by-class will be doing a great deal of consing, eating up lots of memory and wasting our time. Fortunately there is a more efficient way of dealing with all the instances of a class.

     (map-class #'print-friend 'friend)
      name: Carlos birthdate: (1 1 1972)
      name: Adriana birthdate: (24 4 1980)
      name: Zaid birthdate: (14 8 1976)
     => NIL

map-class has the advantage that it does not keep references to objects after they are processed. The garbage collector can come along, clear references from the weak instance cache so that your working set is finite. The list version above conses all objects into memory before you can do anything with them. The deserialization costs are very low in both cases.

Notice that the order in which the records are printed are not sorted according to either name or birthdate. Elephant makes no guarantee about the ordering of class elements, so you cannot depend on the insertion ordering shown here.

So what if we want ordered elements? How do we access our friends according to name and birthdate? This is where slot indices come into play.

     (defpclass friend ()
       ((name :accessor name :initarg :name :index t)
        (birthday :initarg :birthday :index t)))

Notice the :index argument to the slots and that we dropped the class :index argument. Specifying that a slot is indexed automatically registers the class as indexed. While slot indices increase the cost of writes and disk storage, each entry is only slightly larger than the size of the slot value. Numbers, small strings and symbols are good candidate types for indexed slots, but any value may be used, even different types. Once a slot is indexed, we can use the index to retrieve objects by slot values.

get-instances-by-value will retrieve all instances that are equal to the value argument.

     (get-instances-by-value 'friends 'name "Carlos")
     => (#<Carlos>)

But more interestingly, we can retrieve objects for a range of values.

     (get-instances-by-range 'friends 'name "Adam" "Devin")
     => (#<Adriana> #<Carlos>)
     
     (get-instances-by-range 'friend 'birthday
                             (encode-date '(1 1 1974))
                             (encode-date '(31 12 1984)))
     => (#<Zaid> #<Adriana>)
     
     (mapc #'print-friend *)
      name: Zaid birthdate: (14 8 1976)
      name: Adriana birthdate: (24 4 1980)
     => (#<Zaid> #<Adriana>)

To retrieve all instances of a class in the order of the index instead of the arbitrary order returned by get-instances-by-class you can use nil in the place of the start and end values to indicate the first or last element. (Note: to retrieve instances null values, use get-instances-by-value with nil as the argument).

     (get-instances-by-range 'friend 'name nil "Sandra")
     => (#<Adriana> #<Carlos>)
     
     (get-instances-by-range 'friend 'name nil nil)
     => (#<Adriana> #<Carlos> #<Zaid>)

There are also functions for mapping over instances of a slot index. To map over duplicate values, use the :value keyword argument. To map by range, use the :start and :end arguments.

     (map-inverted-index #'print-friend 'friend 'name :value "Carlos")
      name: Carlos birthdate: (1 1 1972)
     => NIL
     
     (map-inverted-index #'print-friend 'friend 'name
                      :start "Adam" :end "Devin")
      name: Adriana birthdate: (24 4 1980)
      name: Carlos birthdate: (1 1 1972)
     => NIL
     
     (map-inverted-index #'print-friend 'friend 'birthday
                      :start (encode-date '(1 1 1974))
                      :end (encode-date '(31 12 1984)))
      name: Zaid birthdate: (14 8 1976)
      name: Adriana birthdate: (24 4 1980)
     => NIL
     
     (map-inverted-index #'print-friend 'friend 'birthday
                      :start nil
                      :end (encode-date '(10 10 1978)))
      name: Carlos birthdate: (1 1 1972)
      name: Zaid birthdate: (14 8 1976)
     => NIL
     
     (map-inverted-index #'print-friend 'friend 'birthday
                      :start (encode-date '(10 10 1975))
                      :end nil)
      name: Zaid birthdate: (14 8 1976)
      name: Adriana birthdate: (24 4 1980)
     => NIL

The User Guide contains a descriptions of the advanced features of Class Indices such as “derived indicies” that allow you to order classes according to an arbitrary function, a dynamic API for adding and removing slots and how to set a policy for resolving conflicts between the code image and a database where the indexing specification differs.

This same facility is also available for your own use. For more information see BTree Indexing.


Next: , Previous: Indexing Persistent Classes, Up: Tutorial

2.9 Using Transactions

One of the most important features of a database is that operations enforce the ACID properties: Atomic, Consistent, Isolated, and Durable. In plainspeak, this means that a set of changes is made all at once, that the database is never partially updated, that each set of changes happens sequentially and that a change, once made, is not lost.

Elephant provides this protection for all primitive operations. For example, when you write a value to an indexed slot, the update to the persistent slot record as well as the slot index is protected by a transaction that performs all the updates atomically and thus enforcing consistency.

2.9.1 Why do we need Transactions?

Most real applications will need to use explicit transactions rather than relying on the primitives alone because you will want multiple read-modify-update operations act as an atomic unit. A good example for this is a banking system. If a thread is going to modify a balance, we don't want another thread modifying it in the middle of the operation or one of the modifications may be lost.

     (defvar *accounts* (make-btree))
     
     (defun add-account (account)
       (setf (get-value account *account*)
     
     (defun balance (account)
       (get-value account *accounts*))
     
     (defun (setf balance) (amount account)
       (setf (get-value account *accounts*) amount))
     
     (defun deposit (account amount)
       "This shows a read and a write function call to
        get then set the balance"
       (let ((balance (balance account)))
         (setf (balance account)
               (+ balance amount))))
     
     (defun withdraw (account amount)
       "A nice concise lisp version for withdraw"
       (decf (balance account) amount))
     
     (add-account 'me)
     => 0
     (deposit 'me 100)
     => 100
     (balance 'me)
     => 100
     (withdraw 'me 25)
     => 75
     (balance 'me)
     => 75

This simple bank example has a significant vulnerability. If two threads read the same balance and one writes a new balance followed by the other, the second balance was written without access to the balance provided by the first and so the first transaction is lost.

The way to avoid this is to group a set of operations together, such as the read and write in deposit and withdraw. We accomplish this by establishing a dynamic context called a transaction.

During a transaction, all changes are cached until the transaction is committed. The changes made by a committed transaction happens all at once. Transactions can also be aborted due to errors that happen while they are active or because of contention. Contention is when another thread writes to a variable that the current transaction is reading. As in the bank example above, if one transaction writes the balance after the current one has read it, then the current one should start over so it has an accurate balance to work with. A transaction aborted due to contention is usually restarted until it has failed too many times.

The simplest and best way to use transactions in Elephant is to simply wrap all the operations in the with-transaction macro. Any statements in the body of the macro are executed within the same transaction. Thus we would modify our example above as follows:

     (defun deposit (account amount)
       (with-transaction ()
         (let ((balance (balance account)))
           (setf (balance account)
                 (+ balance amount)))))
     
     (defun withdraw (account amount)
       (with-transaction ()
         (decf (balance account) amount)))

And presto, we have an ACID compliant, thread-safe, persistent banking system!

2.9.2 Using with-transaction

What is with-transaction really doing for us? It first starts a new transaction, attempts to execute the body, and commits the transaction if successful. If anytime during the dynamic extent of this process there is a conflict with another thread's transaction, an error, or other non-local transfer of control, the transaction is aborted. If it was aborted due to contention or deadlock, it attempts to retry the transaction a fixed number of times by re-executing the whole body.

And this brings us to two important constraints on transaction bodies: no dynamic nesting and idempotent side-effects.

2.9.3 Nesting Transactions

In general, you want to avoid nested uses of with-transaction statements over multiple functions. Nested transactions are valid for some data stores (namely Berkeley DB), but typically only a single transaction can be active at a time. The purpose of a nested transaction in data stores that support them is to break a long transaction into subsets. This way if there is contention on a given subset of variables, only the inner transaction is restarted while the larger transaction can continue. When the inner transaction commits its results, those results become part of the outer transaction but are not written to disk until the outer transaction commits.

If you have transaction protected primitive operations (such as deposit and withdraw) and you want to perform a group of such transactions, for example a transfer between accounts, you can use the macro ensure-transaction instead of with-transaction.

     (defun deposit (account amount)
       "Wrap the balance read and the setf with the new balance"
       (ensure-transaction ()
         (let ((balance (balance account)))
           (setf (balance account)
                 (+ balance amount)))))
     
     (defun deposit (account amount)
       "A more concise version with decf doing both read and write"
       (ensure-transaction ()
         (decf (balance account) amount)))
     
     (defun withdraw (account amount)
       (ensure-transaction ()
         (decf (balance account) amount)))
     
     (defun transfer (src dst amount)
       "There are four primitive read/write operations
        grouped together in this transaction"
       (with-transaction ()
         (withdraw src amount)
         (deposit dst amount)))

ensure-transaction is exactly like with-transaction except it will reuse an existing transaction, if there is one, or create a new one. There is no harm, in fact, in using this macro all the time.

Notice the use of decf and incf above. The primary reason to use Lisp is that it is good at hiding complexity using shorthand constructs just like this. This also means it is also going to be good at hiding data dependencies that should be captured in a transaction!

2.9.4 Idempotent Side Effects

Within the body of a with-transaction, any non database operations need to be idempotent. That is the side effects of the body must be the same no matter how many times the body is executed. This is done automatically for side effects on the database, but not for side effects like pushing a value on a lisp list, or creating a new standard object.

     (defparameter *transient-objects* nil)
     
     (defun load-transients (n)
        "This is the wrong way!"
        (with-transaction ()
           (loop for i from 0 upto n do
              (push (get-from-root i) *transient-objects*))))

In this contrived example we are pulling a set of standard objects from the database using an integer key and pushing them onto a list for later use. However, if there is a conflict where some other process writes a key-value pair to a matching key, the whole transaction will abort and the loop will be run again. In a heavily contended system you might see results like the following.

     (defun test-list ()
        (setf *transient-objects* nil)
        (load-transients)
        (length *transient-objects*))
     
     (test-list 3)
     => 3
     
     (test-list 3)
     => 5
     
     (test-list 3)
     => 4

So the solution is to make sure that the operation on the lisp parameters is atomic if the transaction completes.

     (defun load-transients (n)
       "This is a better way"
       (setq *transient-objects*
             (with-transaction ()
                 (loop for i from 0 upto n collect
                       (get-from-root i)))))

(Of course we would need to use nreverse if we cared about the order of instances in *transient-objects*)

The best rule-of-thumb is to ensure that transaction bodies are purely functional as above, except for side effects to persistent objects and btrees.

If you really do need to execute side-effects into lisp memory, such as writes to transient slots, make sure they are idempotent and that other processes cannot read the written values until the transaction completes.

2.9.5 Transactions and Performance

By now transactions almost look like more work than they are worth! Fortunately, there are also performance benefits to explicit use of transactions. Transactions gather together all the writes that are supposed to made to the database and store them in memory until the transaction commits, and only then writes them to the disk.

The most time-intensive component of a transaction is waiting while flushing newly written data to disk. Using the default auto-committing behavior requires a disk flush for every primitive write operation. This is very, very expensive! Because all the values read or written are cached in memory until the transaction completes, the number of flushes can be dramatically reduced.

But don't take my word for it, run the following statements and see for yourself the visceral impact transactions can have on system performance.

     (defpclass test ()
       ((slot1 :accessor slot1 :initarg :slot1)))
     
     (time (loop for i from 0 upto 100 do
              (make-instance 'test :slot1 i)))

This can take a long time, well over a minute on the CLSQL data store. Here each new objects that is created has to independantly write its value to disk and accept a disk flush cost.

     (time (with-transaction ()
              (loop for i from 0 upto 100 do
                 (make-instance 'test :slot1 i))))

Wrapping this operation in a transaction dramatically increases the time from 10's of seconds to a second or less.

     (time (with-transaction ()
              (loop for i from 0 upto 1000 do
                 (make-instance 'test :slot1 i))))

When we increase the number of objects within the transaction, the time cost does not go up linearly. This is because the total time to write a hundred simple objects is still dominated by the disk writes.

These are huge differences in performance! However we cannot have infinitely sized transactions due to the finite size of the data store's memory cache. Large operations (such as loading data into a database) need to be split into a sequential set of smaller transactions. When dealing with persistent objects a good rule of thumb is to keep the number of objects touched in a transaction well under 1000.

2.9.6 Transactions and Applications

Designing and tuning a transactional architecture can become quite complex. Moreover, bugs in your system can be very difficult to find as they only show up when transactions are interleaved within a larger, multi-threaded application.

In many cases you can simply ignore transactions. For example, when you don't have any other concurrent processes running. In this case all operations are sequential and there is no chance of conflicts. You would only want to use transactions to improve performance on repeated sets of operations.

You can also ignore transactions if your application can guarantee that concurrency won't generate any conflicts. For example, a web app that guarantees only one thread will write to objects in a particular session can avoid transactions altogether. However, it is good to be careful about making these assumptions. In the above example, a reporting function that iterates over sessions, users or other objects may still see partial updates (i.e. a user's id was written prior to the query, but not the name). However, if you don't care about these infrequent glitches, this case would still hold.

If these cases don't apply to your application, or you aren't sure, you will fare best by programming defensively. Break your system into the smallest logical sets of primitive operations (i.e. withdraw and deposit) using ensure-transaction and then wrap the highest level calls made to your system in with-transaction when the operations absolutely have to commit together or you need the extra performance. Try not to have more than two levels of transactional accesses with the top using with-transaction and the bottom using ensure-transaction.

See Transaction Details for more details and Design Patterns for examples of how systems can be designed and tuned using transactions.


Previous: Using Transactions, Up: Tutorial

2.10 Advanced Topics

The tutorial covers the essential topics and concepts for using Elephant. Many people will find that these features are the ones that are most often needed and used in ordinary applications.

More sophisticated uses of Elephant may require additional features that are covered in the user guide. The following is a list of major features in the user guide that were not covered in this tutorial.

Further, see Design Patterns for information about Elephant design patterns, solutions to common problems and other scenarios with multiple possible solutions.


Next: , Previous: Tutorial, Up: Top

3 Installation


Next: , Up: Installation

3.1 Requirements

Elephant is a multi-platform, multi-lisp and multi-backend system. As such there is a great deal of complexity in testing. The system has tried to minimize external dependencies as much as possible to ease installation, but it still requires some patience and care to bring Elephant up on any given platform. This section attempts to simplify this for new users as much as possible. Patches and suggestions will be gladly accepted.

3.1.1 Supported Lisp, Platform and Data store combinations

Elephant supports SBCL, Allegro, Lispworks, OpenMCL and CMUCL. Each lisp is supported on each of the platforms it runs on: Mac OS X, Linux and Windows. As of release 0.6.1, both 32-bit and 64-bit systems should be supported.

Due to the small number of developers and the large number of configurations providing full test coverage is problematic. There are:

  1. Five lisp environments
  2. Three Operating System platforms
  3. 32-bit or 64-bit OS/compilation configuration
  4. Three data store configurations: Berkeley DB, SQLite3 and Postgresql

which means that the total number of combinations to be tested could be as much as:

lisps * os * radix * dstore = 5 * 3 * 2 * 3 = 90 configurations

Not all of these combinations are valid, but the implication is that not every combination will be tested in any given release. The developers and user base regularly use the following platforms

The CLSQL backend is used predominantly under SBCL on Linux and Mac OS X at the time of writing. The developers will do their best to accomodate users who are keen to test other combinations, but the above configurations will be the most stable and reliable.

Elephant is now quite stable in general, so don't be afraid to try an unemphasized combination - chances are it is just a little more work to bring it up. In particular, Elephant can probably work with MySQL or Oracle with just a little work, but nobody has asked for this yet.

3.1.2 Library dependencies

The Elephant core system requires:

  1. asdf – http://www.cliki.net/asdf
  2. uffi – requires version 1.5.18 or later, http://uffi.b9.com/ or http://www.cliki.net/UFFI
  3. cl-base64 – http://www.cliki.net/cl-base64
  4. gcc – Your system needs GCC (or Cygwin) to build the Elephant C-based serializer library. (Precompiled DLL's are available for Windows platforms on the download page.
  5. rt – The RT regression test sytem is required to run the test suite: http://www.cliki.net/RT

Follow the instructions at these URLs to download and setup the libraries. (Note: uffi and cl-base64 are asdf-installable for those of you with asdf-install on your system). Elephant, however, is not asdf-installable today.

In addition to these libraries, each data store has their own dependencies as discussed in Berkeley DB and CL-SQL.


Next: , Previous: Requirements, Up: Installation

3.2 Configuring Elephant

Before you can load the elephant packages into your running lisp, you need to setup the configuration file. Copy the reference file config.sexp from the root directory to my-config.sexp in the root directory. my-config.sexp contains a lisp reader-formatted list of key-value pairs that tells elephant where to find various libraries and how to build them.

For example:

     #+(and (or sbcl allegro) macosx)
     ((:berkeley-db-include-dir . "/opt/local/include/db45/")
      (:berkeley-db-lib-dir . "/opt/local/lib/db45/")
      (:berkeley-db-lib . "/opt/local/lib/db45/libdb-4.5.dylib")
      (:berkeley-db-deadlock . "/opt/local/bin/db45_deadlock")
      (:compiler . :gcc))

The following is a guide to the various parameters. For simplicity, we include all the parameters here, although we will go into more detail in each of the data store sections.

The config.sexp file contains a set of example configurations to start from, but you will most likely need to modify it for your system.

Elephant has one small C library that it uses for binary serialization. This means that you need to have gcc in your path (see Elephant on Windows for exceptions on the Windows platform).


Next: , Previous: Configuring Elephant, Up: Installation

3.3 Loading Elephant

3.3.1 Loading Elephant via ASDF

Now that you have loaded all the dependencies and created your configuration file you can load the Elephant packages and definitions:

     (asdf:operate 'asdf:load-op :elephant)

This will load the cl-base64 and uffi libraries. It will also automatically compile and load the C library. The build process no longer depends on a Makefile and has been verified on most platforms, but if you have a problem please report it, and any output you can capture, to the developers at elephant-devel@common-lisp.net. We will update the FAQ at http://trac.common-lisp.net/elephant with common problems users run into.

3.3.2 Two-Phase Load Process

Elephant uses a two-phase load process. The core code is loaded and the code for a given data store is loaded on demand when you call open-store with a specification referencing that data store. The second phase of the load process requires ASDF to be installed on your system.

(NOTE: There are some good reasons and not so good reasons for this process. One reason you cannot load ele-bdb.asd directly as it depends on lisp code defined in elephant.asd. We decided not to fix this in the 0.9 release although later releases may improve on this).

3.3.3 Packages

Now that Elephant has been loaded, you can call use-package in the cl-user package,

     CL-USER> (use-package :elephant)
     => T

use a predefined user package,

     CL-USER> (in-package :elephant-user)
     => T
     
     ELE-USER>

or import the symbols into your own project package from :elephant.

     (defpackage :my-project
       (:use :common-lisp :elephant))

The imported symbols are all that is needed to control Elephant databases and are documented in detail in User API Reference

3.3.4 Opening a Store

As discussed in the tutoral, you need to open a store to begin using Elephant:

     (open-store '(:BDB "/Users/owner/db/my-bdb/"))
     ...
     ASDF loading messages
     ...
     => #<BDB-STORE-CONTROLLER>
     
     (open-store '(:CLSQL (:POSTGRESQL "localhost.localdomain"
                                       "mydb" "myuser" ""))))
     ...
     ASDF loading messages
     ...
     => #<SQL-STORE-CONTROLLER>

The first time you load a specific data store, Elephant will call ASDF to load all the specified data store's dependencies, connect to a database and return the store-controller subclass instance for that data store.


Next: , Previous: Loading Elephant, Up: Installation

3.4 Berkeley DB

The Berkeley DB Data Store started out as a very simple data dictionary in the Berkeley Unix operating system. There are many “Xdb” systems that use the same API, or a similarly one. A free for non-commercial use version of Berkeley DB is provided by Oracle corporation with commercial licenses available. Please follow the download and installation procedures defined here:

http://www.oracle.com/technology/products/berkeley-db/db/index.html

Elephant only works with version 4.5 of BerkeleyDB.


Next: , Previous: Berkeley DB, Up: Installation

3.5 Setting up Berkeley DB

We recommend that you download and build a distribution from Oracle. Some problems have been reported with linking to Debian, Cygwin or other packages. This is especially true for Windows users.

Beyond ensuring that the file “my-config.sexp” points to your BDB installation directories and files, nothing else should be required to configure the example that uses a local “testdb” directory as a dabase (under “tests”) in the top-level Elephant directory.

On one Fedora based system, the “my-config.sexp” file looked like this:

     ((:berkeley-db-include-dir . "/usr/local/BerkeleyDB.4.5/include")
      (:berkeley-db-lib-dir . "/usr/local/BerkeleyDB.4.5/lib")
      (:berkeley-db-lib . "/usr/local/BerkeleyDB.4.5/lib/libdb.so")
      (:berkeley-db-deadlock . "/usr/local/BerkeleyDB.4.5/bin/db_deadlock")
      (:pthread-lib . nil)
      (:clsql-lib . "/usr/local/share/common-lisp/")
      (:compiler . :gcc))

The Test Suites give a nice example of using BDB by running the test using the specification:

     '(:BDB "<elephant-root>/tests/testdb/")

Once you start working on an application, you will want to change the path to a directory that is appropriate for your application, and use that as the specification passed to open-store on application startup.


Next: , Previous: Berkeley DB Example, Up: Installation

3.6 Upgrading Berkeley DB Databases

When there is a new release of Elephant, it will depend on a new version of Berkeley DB. If so, you must upgrade your BDB databases to use the new version Elephant. This forced upgrade is a consequence of Elephant not parsing the BDB header files which tend to change various important constants with each release. These patches are usually minor. Upgrading also happens because Elephant tries to leverage new features of Berkeley DB.

The rest of this section talks about how to upgrade your existing Berkeley DB databases, opening them in the new Elephant version and migrating them to a newly created Elephant database.

3.6.1 Upgrading to 0.9

This section outlines how to upgrade from Elephant version 0.6.0 and Berkeley DB 4.3.

  1. Install BDB 4.5 (keep 4.3 around for now)
  2. Setup my-config.sexp to point to the appropriate BDB 4.5 directories
  3. Upgrade your existing database directory to 4.5
  4. Upgrade 0.6 data to a fresh 0.9 database
  5. Test your new application and report any bugs that arise to elephant-devel@common-lisp.net

(NOTE: close-store may fail when closing the old 0.6 database, this is OK.)

(NOTE: 64-bit lisps will not successfully upgrade 32-bit 0.6 databases. Use a 32-bit version of your lisp to update to 0.9 and then open that database in your 64-bit lisp. There should be no compatibility problems. Best to test your application on a 32-bit lisp if you can, just to be sure.)

3.6.2 Upgrade from Elephant 0.5

Follow the upgrade procedures outlined in the Elephant 0.6.0 INSTALL file to upgrade your database from 0.5 to 0.6.0. Then follow the above procedures for upgrading to 0.9.

(NOTE: It may not take much work to make 0.9 upgrade directly from 0.5 However there are so few (none?) 0.5 users that it wasn't deemed worth the work given that there's an upgrade path available.)


Next: , Previous: Upgrading Berkeley DB Databases, Up: Installation

3.7 CL-SQL

Although originally designed as an interface to the BerkeleyDB system, the original Elephant system has been extended to support the use of relational database management systems as the implementation of the persistent store. This relies on Kevin Rosenberg's CL-SQL interface, which provides access to a large number of relational systems.

A major motivation of this extension is that one one might prefer the licensing of a different system. For example, at the time of this writing, it is our interpretation that one cannot use the BerkeleyDB system behind a public website http://www.sleepycat.com/download/licensinginfo.shtml#redistribute unless one releases the entire web application as open source.

Neither the PostGres DBMS nor SQLite 3, nor Elephant itself, imposes any such restriction.

Other reasons to use a relational database system might include: familiarity with those systems, the fact that some part of your application needs to use the truly relational aspects of those systems, preference for the tools associated with those systems, etc.

Elephant provides functions for migrating data seamlessly between data stores. One can quite easily move data from a BerkeleyDB repository to a PostGres repository, and vice versa. This offers at least the possibility than one can develop using one data store, for example BerkeleyDB, and then later move to Postgres. One could even operate simultaneously out of multiple repositories, if there were a good reason to do so.

The SQL implementation shares the serializer with the BDB data store, but base64 encodes the resulting binary stream. This data is placed into a single table in the SQL data store.

All functionality except for nested transaction support and cursor-puts supported by the BerkeleyDB data store is supported by the CL-SQL data store. CL-SQL transaction integrity under concurrent operation has not been extensively stress tested.

Additionally, it is NOT the case that the Elephant system currently provides transaction support across multiple repositories; it provides transaction support on a per-repository basis.

The PostGres backend is currently about 5 times slower than the BerkeleyDB backend. As of the time of this writing, only PostGres and SqlLite 3 have been tested as CL-SQL backends.


Next: , Previous: CL-SQL, Up: Installation

3.8 CL-SQL Example

To set up a PostGres based back end, you should:

  1. Install postgres and make sure postmaster is running. Postgres may be installed on your system; you may be able to use a package manager to install it, or you can install it from the PostgresSQL site directly (http://www.postgresql.org/).
  2. Create a database called “test” and set its permissions to be reached by whatever connection specification you intend to use. The tests use:
              (defvar *testpg-path*
              '(:postgreql "localhost.localdomain" "test" "postgres" ""))
         

    which means that connections must be allowed to the database test, user “postgres”, no password, connected from the same machine “localhost.localdomain”. (This would be changed to something more secure in a real application.) Typically you edit the file : pg_hba.conf to enable various kinds of connections in postgres.

  3. Be sure to enable socket connection to postgres when you invoke the postmaster.
  4. Test that you can connect to the database with these credentials by running: psql -h 127.0.0.1 -U postgres test before you attempt to connect with Elephant.

Furthermore, you must grant practically all creation/read/write privileges to the user postgres on this schema, so that it can construct the tables it needs.

Upon first opening a CL-SQL based store controller, the tables, indexes, sequences, and so on needed by the Elephant system will be created in the schema named “test” automatically.


Next: , Previous: CL-SQL Example, Up: Installation

3.9 Elephant on Windows

The build process on Windows currently only works with GCC under Cygwin. The process can be a bit tricky, so if it doesn't work out of the box or you don't want to install cygwin, we recommend that you download the DLLs from the Elephant website download page (http://www.common-lisp.net/project/elephant/downloads.html').

Unpack the .zip file into the elephant root directory. Ensure that your my-config.sexp file configuration for Windows has :prebuilt-binaries set to “t” so it will know to look in the elephant root during the asdf loading process.

For Berkeley DB users we recommend downloading the Windows binary distribution of Berkeley DB 4.5 to minimize any potential linking issues.


Next: , Previous: Elephant on Windows, Up: Installation

3.10 Test Suites

Elephant has matured quite a bit over the past year or two. Hopefully, it will work out-of-the-box for you.

However, if you are using an LISP implementation different than the ones on which it is developed and maintained (see Requirements) or you have a problem that you think may be a bug, you may want to run the test suites. If you report a bug, we will ask you to run these tests and report the output. Running them when you first install the system may give you a sense of confidence and understanding that makes it worth the trouble.

There are three files that execute the tests. You should choose one as a starting point based on what backend(s) you are using. If using BerekeleyDB, use

     BerkeleyDB-tests.lisp

If using both, use both of the above and also use:

     MigrationTests.lisp

The text of this file is included here to give the casual reader an idea of how elepant test can be run in general:

     ;; If you are only using one back-end, you may prefer:
     ;; SQLDB-test.lisp or BerkeleyDB-tests.lisp
     (asdf:operate 'asdf:load-op :elephant)
     (asdf:operate 'asdf:load-op :ele-clsql)
     (asdf:operate 'asdf:load-op :ele-bdb)
     (asdf:operate 'asdf:load-op :ele-sqlite3)
     
     (asdf:operate 'asdf:load-op :elephant-tests)
     
     (in-package "ELEPHANT-TESTS")
     
     ;; Test Postgres backend
     (setq *default-spec* *testpg-spec*)
     (do-backend-tests)
     
     ;; Test BDB backend
     (setq *default-spec* *testbdb-spec*)
     (do-backend-tests)
     
     ;; Test SQLite 3
     (setq *default-spec* *testsqlite3-spec*)
     (do-backend-tests)
     
     ;; Test a Migration of data from BDB to postgres
     (do-migration-tests *testbdb-spec* *testpg-spec*)
     
     ;; An example usage.
     (open-store *testpg-spec*)
     (add-to-root "x1" "y1")
     (get-from-root "x1")
     
     (add-to-root "x2" '(a 4 "spud"))
     (get-from-root "x2")

The appropriate test should execute for you with no errors. If you get errors, you may wish to report it the elephant-devel at common-lisp.net email list.

Setting up SQLite3 is even easier. Install SQLite3 (I had to use the source rather than the binary install, in order to get the dynamic libraries constructed.)

An example use of SQLLite3 would be:

     (asdf:operate 'asdf:load-op :elephant)
     (asdf:operate 'asdf:load-op :ele-clsql)
     (asdf:operate 'asdf:load-op :ele-sqlite3)
     (in-package "ELEPHANT-TESTS")
     (setq *test-path-primary* '(:sqlite3 "testdb"))
     (do-all-tests-spec *test-path-primary*)

The file RUNTESTS.lisp, although possibly not exactly what you want, contains useful example code.

You can of course migrate between the three currently supported repository strategies in any combination: BDB, Postgresql, and SQLite3.

In all probability, other relational datbases would be very easy to support but have not yet been tested. The basic pattern of the “path” specifiers is (cons clsqal-database-type-symbol (normal-clsql-connection-specifier)).


Previous: Test Suites, Up: Installation

3.11 Documentation

If you are getting the documentation as a released tar file, you will probably find the documenation in .html or .pdf form in the release, or can find it at the Elephant website.

If you want to compile the documentation youself, for example, if you can think of a way to improve this manual, then you will do something similar to this in a shell or command-line prompt:

     cd doc
     make
     make pdf

This process will populate the “./includes” directory with references automatically extracted from the list code. Currently this docstring extraction process relies on SBCL, but with minor modifications the scripts should work with other lisp environemnts.

The Makefile will then compile the texinfo documentation source into an HTML file and a PDF file which will be left in the “doc/” directory. An info style HTML tree is also created in the “doc/elephant” directory. This tree contains one node per HTML file.

Don't edit anything in the “doc/elephant” directory or the “doc/includes” directories, as everything in these directories is generated. Instead, edit the “.texinfo” files in the doc directory.


Next: , Previous: Installation, Up: Top

4 User Guide


Next: , Up: User Guide

4.1 The Store Controller

An instance of the store-controller class mediates interactions between Lisp and a data store. All elephant operations are performed in the context of a store controller. To be more specific, a data store provides a subclass of store-controller specialized to that data store. Typically this object contains pointers to the disk files, foreign memory regions and any other necessary bookkeeping information to support Elephant operations such as slot writes and btree operations. The store also contains the root objects and other bookeeping common to all data stores.

To obtain a store-controller object, call the function open-store with a store controller specification. The current data store specification formats are:

Valid CLSQL database tags for <sql-db-name> are :SQLITE and :POSTGRESQL. The <sql-connect-command> is what you would pass to CLSQL's connect command.

The open store function uses the first symbol in the specification (i.e. :BDB or :CLSQL) to dispatch instance creation to the specified data store which returns a specialized instance of store-controller. open-store then initializes the store using an internal call to open-controller.

The final step of open-store is to set the global variable *store-controller*. This special variable is used as a default value in the optional or keyword arguments to number of operations such as:

Each of these functions also accepts an explicit store controller argument for use in multiple store environments. Normal applications should only be aware that this global parameter is used. For further discussion of *store-controller* see Multi-repository Operation.

Additionally, open-store accepts data store specific keyword arguments. For example, you can force recovery to be run on Berkeley DB data stores:

     (open-store *my-spec* :recover t)

The data store sections of the user guide (Berkeley DB Data Store and CLSQL Data Store) list all the data-store specific options to various elephant functions.

When you finish your application, close-store will close the store controller. Failing to do this properly may lead to a need to run recovery on the data store during the next session. Again, see the relevant data store sections for more detail.


Next: , Previous: The Store Controller, Up: User Guide

4.2 Serialization details

There are consequences to trying to move values from lisp memory onto disk in order to persist them. The first consequence is that that pointers cannot be guaranteed to be valid and so references to lisp objects cannot be maintained. This is very similar to the problems with passing references in foreign function interfaces. The second, and more frustrating limitation is that lisp operations that commit side effects on aggregate objects, such as objects, arrays, etc, cannot be trapped and replicated on the disk representation. This leads up to a very important consequence: all lisp objects are stored by value. This policy has a number of consequences which are detailed below.

4.2.1 Restrictions of Store-by-Value

  1. Lisp identity can't be preserved. Since this is a store which persists across invocations of Lisp, this probably doesn't even make sense. However if you get an object from the index, store it to a lisp variable, then get it again - they will not be eq:
              (setq foo (cons nil nil))
              => (NIL)
              (add-to-root "my key" foo)
              => (NIL)
              (add-to-root "my other key" foo)
              => (NIL)
              (eq (get-from-root "my key")
                    (get-from-root "my other key"))
              => NIL
         
  2. Nested aggregates are serialized recursively into a single buffer. If you store an set of objects in a hash table you try to store a hash table, all of those objects will get stored in one large binary buffer with the hash keys. This is true for all aggregates that can store type T (cons, array, standard object, etc).
  3. Circular References. One benefit provided by the serializer is that the recursive serialization process does not lead to infinite loops when they encounter circular references among aggregate types. It accomplishes this by assigning an ID to any non-atomic object and keeping a mapping between previously serialized objects and these ids. This same mapping is used to reconstruct references in lisp memory on deserialization such that the original structure is properly reproduced.
  4. Storage limitations. The serializer writes sequentially into a contiguous foreign byte array before passing that array to a given data store's API. There are practical limits to the size of the foreign buffer that lisp can allocate (usually somewhere on the order of 10-100MB due to address space fragmentation). Moreoever, most data stores will have a practical limit to the size of a transaction or the size of key or value they will store. Either of these considerations should encourage you to plan to limit the size of objects that you serialize to disk. A good rule of thumb is to stay under a handful of megabytes. We have successfully serialized arrays over 100MB in the past, but have not tested the robustness of these large values over time.
  5. Mutated substructure does not persist.
              (setf (car foo) T)
              => T
              (get-from-root "my key")
              => (NIL)
         

    This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use Persistent Sets (see Persistent Sets) or BTrees (see Persistent BTrees).

  6. Serialization and deserialization can be costly. While serialization is pretty fast, but it is still expensive to store large objects wholesale. Also, since object identity is impossible to maintain, deserialization must re-cons or re-allocate the entire object every time increasing the number of GCs the system does. This eager allocation is contrary to how most people want to use a database: one of the reasons to use a database is if your objects can't fit into main memory all at once.
  7. Merge-conflicts in heavily multi-process/threaded situations. This is the common read-modify-write problem in all databases. We will talk more about this in the Transaction Details section.
  8. Byte Ordering. The primitive elements such as integers are written to disk in the native byte ordering of the machine on which the lisp runs. This means that little endian machines cannot read values written by big endian machines and vice a versa.
  9. Unicode codes and Serialized Strings. The characters and strings stored to disk can store and recover lisp character codes that implement unicode, but the character maps are the lisp character maps (produced by char-code) and not strict unicode codes so lisps may not be able to interoperably read characters unless they have identical character code maps for the character sets you are reading and writing. All standard ASCII strings should be portable. Here is what we know about specific lisps, but this should not be taken as gospel.

4.2.2 Atomic Types

Atomic types have no recursive substructure. That is they cannot contain arbitrary objects and are of a bounded size. (Bignums are an exception, but they have a predictable structure and cannot reference or otherwise encapsulate other objects). The following is a list of atoms and a discussion of how they are serialized.

4.2.3 Aggregate Types

The next list are aggregate types, meaning that elements of that type can contain references to elements of type T. That means, in theory, that storing an aggregate type to disk that refers to other objects can copy every reachable object! This is a direct and dire consequence of the “store-by-value” restriction. (see Persistent Classes and Objects for how to design around the store-by-value restriction).

This list describes how aggregates are handled by the serializer.

One final strategic consideration is to whether you plan on sharing the binary database between machines or between different lisp platforms on the same machine. This is almost possible today, but there are some restrictions. In the section Repository Migration and Upgrade we will discuss possible ways of migrating an existing database across platforms and lisps.


Next: , Previous: Serialization details, Up: User Guide

4.3 Persistent Classes and Objects

Persistent classes are instances of the persistent-metaclass metaclass. All persistent classes keep track of which slots are :persistent, :transient and/or :indexed and are used as specializers in the persistence meta-object protocols (initialization of slots, slot-access, etc).

All persistent classes create objects that inherit from the persistent class. The persistent class provides two slots that contain a unique object identifier (oid) and a reference to the store-controller specification they are associated with. Persistent slots do not take up any storage space in memory, instead the persistent-metaclass slot access protocol redirects slot accesses into calls to the store controller. Typically, the underlying data store will then perform the necessary serialization, deserialization to read and write data to disk.

When a reference to a persistent instance itself is written to the database, for example as a key or value in a btree, only the unique ID and class of the instance is stored. When read, a persistent object instance is re-created (see below). This means that serialization of persistent objects is exceedingly cheap compared to standard objects. The subsection on instance creation below will discuss the lifecycle of a persistent object in more detail.

4.3.1 Persistent Class Definition

To create persistent classes, the user needs to specify the persistent-metaclass to the class initarg :metaclass.

     (defclass my-pclass ()
        ((slot1 :accessor slot1 :initarg :slot1 :initform 1))
        (:metaclass persistent-metaclass))

The only differences between the syntax of standard and persistent class definitions is the ability to specify a slot storage policy and an index policy. Slot value storage policies are specified by a boolean argument to the slot initargs :persistent, :transient and :indexed. Slots are :persistent and not :indexed by default.

The defpclass macro is provided as a convenience to hide the :metaclass slot option.

     (defpclass my-pclass ()
        ((pslot1 :accessor pslot1 :initarg :pslot1 :initform 'one)
         (pslot2 :accessor pslot2 :initarg :pslot2 :initform 'two
                 :persistent t)
         (tslot1 :accessor tslot1 :initarg :tslot1 :initform 'three
                 :transient t)))

In the definition above the class my-pclass is an instance of the metaclass persistent-metaclass. According to this definition pslot1 and pslot2 are persistent while tslot1 is transient and stored in memory.

Slot storage class implications are straightforward. Persistent slot writes are durably stored to disk and reads are made from disk and can be part of a ACID compliant transaction . Transient slots are initialized on instance creation according to initforms or initargs. Transient slot values are never stored to nor loaded from the database and their accesses cannot be protected by transactions. (Ordinary multi-process synchronization would be required instead).

The :index option tells Elephant whether to maintain an inverted index that maps slot values to their parent objects. The behavior of indexed classes and class slots are discussed in depth in Class Indices.

Persistent classes have their metaobject protocols modified through specializations on persistent-metaclass. These specializations include the creation of special slot metaobjects: transient-slot-definition, persistent-slot-definition and direct and effective versions of each. For the MOP aficionado the highlights of the new class initialization protocols are as follows:

Reinitialization is discussed in the section on class redefinition.

4.3.2 Instance Creation

Persistent objects are created just like standard objects, with a call to make-instance. Initforms and slot initargs behave as the user expects. The call to make-instance of a persistent class will fail unless there is a default store-controller instance in the variable *store-controller* or the :sc keyword argument is provided a valid store controller object. The store controller is required to provide a unique object id, initialize the specification pointer of the instance and to store the values of any initialized slots. The initialization process is as follows:

Persistent slots are initialized only under the following conditions:

After initialization the persistent instance is added to its host store controller's object cache. This cache is a weak hash table that maps oids to object instances. So after initialization the following state has been created:

If you mnanually create an object using an OID which already exists in the database, initargs to make-instance take precedence over existing values in the database, which in turn take precedence over any initforms defined in the class.

4.3.3 Persistent Instance Lifecycle

The distributed nat