Elephant System
Original Version, Copyright © 2004 Ben Lee and Andrew Blumberg.
Version 0.5, Copyright © 2006 Robert L. Read.
Versions 0.6-0.9, Copyright © 2006-2007 Ian Eslick and Robert L. Read
Portions copyright respective contributors (see CREDITS).
Elephant Manual
Original Version, Copyright © 2004 Ben Lee.
Versions 0.5-0.6, Copyright © 2006 Robert L. Read.
Current Version, Copyright © 2006-2007 Ian Eslick and Robert L. Read
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License. See the Copyright and License chapter for details about copyright, license and warranty for this manual and the Elephant system.
Elephant is a persistent object protocol and database for Common Lisp. The persistent protocol component of elephant overrides class creation and standard slot accesses using the Meta-Object Protocol (MOP) to render slot values persistent. Database functionality includes the ability to persistently index and retrieve ordered sets of class instances and ordinary lisp values. Elephant has an extensive test suite and the core functionality is becoming quite mature.
The Elephant code base is available under the LLGPL license. Data stores each come with their own, separate license and you will have to evaluate the implications of using them yourself.
Elephant was originally envisioned as a lightweight interface layer on top of the Berkeley DB library, a widely-distributed embedded database that many unix systems have installed by default. Berkeley DB is ACID compliant, transactional, process and thread safe, and fast relative to relational databases.
Elephant has been extended to provide support for multiple backends, specifically a relational database backend based on CL-SQL which has been tested with Postgres and SQLite 3, and probably support other relational systems easily. It supports, with some care, multi-repository operation and enables convenient migration of data between repositories.
The support for relational backends and migration to the LLGPL was to allow for broader use of Elephant in both not-for-profit and commercial settings. Several additional backends are planned for future releases including a native Lisp implementation released under the LLGPL.
Elephant's current development focus is to enhance the feature set including a native lisp backend, a simple query language, and flexible persistence models that selectively break one or more of the ACID constraints to improve performance.
Join the Elephant mailing lists to ask your questions and receive updates. You can also review archives for past discussions and questions. Pointers can be found on the Elephant website at
http://www.common-lisp.net/project/elephant.
Installation instructions can be found in the Installation section. Bugs can be reported via the Elephant Trac system at
http://trac.common-lisp.net/elephant/.
This also serves as a good starting point for finding out what new features or capabilities you can contribute to Elephant. The Trac system also contains a wiki with design discussions and a FAQ.
Elephant is a Persistence Metaprotocol and Database for Common Lisp. It provides the ability for users to define and interact with persistent objects and to transparently store ordinary lisp values. Persistent objects are CLOS instances that overload the ordinary slot access semantics so that every write to a slot is passed through and written to disk. Non-persistent lisp objects and values can be written to slots and will be automatically persisted. In addition, Elephant provides a persistent index which maintains an ordered collection of lisp values or persistent object references.
The use of persistent objects makes coding concise, convenient, and powerful, and makes persistence almost invisible to the programmer. However, Elephant also allows the same basic data dictionary of key/value retrieval that BerkeleyDB provides.
When someone says "database," most people think of SQL Relational Data Base Management Systems (e.g. Oracle, Postgresql, MySql). Those systems store data in statically typed tables with unique shared values to connect rows in separate tables. Objects can be mapped into these tables in an object-relational mapping that assigns objects to rows and slot values to columns in a row's table. If a slot references another type of object, a unique ID can be used to reference that object's table. CL-SQL, for example, provides facilities for this kind of object-relational mapping and there are many systems for other languages that do the same (i.e. Hibernate for Java).
While Elephant can use either RDBMSs or Berkeley DB as a data store, the model it supports is that of objects stored in persistent indices. Unlike systems such as Hibernate for Java, the user does not need to construct or worry about a mapping from the object space into the database. Elephant relies on LISP rather than SQL for its data manipulation language. Elephant is designed to be a simple and convenient tool for the programmer.
Elephant consists of a small universe of basic concepts:
*store-controller* to function.
There are a set of more advanced concepts you will learn about later, but these basic concepts will serve to acquaint you with Elephant.
If you do not already have Elephant installed and building correctly, read the Installation section of this manual and then move on to Getting Started.
The first step in using elephant is to open a store controller. A store controller is an object that coordinates lisp program access to the chosen data store.
To obtain a store controller, you call open-store with a store
specification. A store specification is a list containing a backend
specifier (:BDB or :CLSQL) and a backend-specific
reference.
For :BDB, the second element is a string or pathname that references a local directory for the database files. This directory must be created prior to calling open-store.
(open-store '(:BDB ``/users/me/db/my-db/''))
For :CLSQL the second argument is another list consisting of a specific SQL database and the name of a database file or connection record to the SQL server. Examples are:
(open-store '(:CLSQL (:SQLITE "/users/me/db/sqlite.db")))
(open-store '(:CLSQL (:POSTGRESQL "localhost.localdomain"
"mydb" "myuser" ""))))
We use Berkeley DB as our example backend. To open a BDB store-controller we can do the following:
(asdf:operate 'asdf:load-op :elephant)
(use-package :elephant)
(setf *test-db-spec*
'(:BDB "/home/me/db/testdb/"))
(open-store *test-db-spec*)
We do not need to store the reference to the store just now as it is
automatically assigned to the variable, *store-controller*.
For a deeper discussion of store controller management see the
User Guide.
When you're done with your session, release the store-controller's
resources by calling close-store.
Also there is a convenience macro with-open-store that will
open and close the store, but opening the store is an expensive
operation so it is generally better to leave the store open until your
application no longer needs it.
What values live between lisp sessions is called liveness. Liveness in a store is determined by whether the value can be reached from the root of the store. The root is a special BTree in which other BTrees and lisp values can be stored. This BTree has a special interface through the store controller. (There is a second root BTree called the class root which will be discussed later.)
You can put something into the root object by
(add-to-root "my key" "my value")
=> "my value"
and get things out via
(get-from-root "my key")
=> "my value"
=> T
The second value indicates whether the key was found. This is important if your key-value pair can have nil as a value.
You can perform other basic operations as well.
(root-existsp "my key")
=> T
(remove-from-root "my key")
=> T
(get-from-root "my key")
=> NIL
=> NIL
To access all the objects in the root, the simplest way is to
simply call map-root with a function to apply to each
key-value pair.
(map-root
(lambda (k v)
(format t "key: ~A value:~A~%" k v)))
You can also access the root object directly.
(controller-root *store-controller*)
=> #<DB-BDB::BDB-BTREE #x10e86042>
It is an instance of a class "btree"; see Persistent BTrees.
What can you put into the store besides strings? Almost all lisp values and objects can be stored: numbers, symbols, strings, nil, characters, pathnames, conses, hash-tables, arrays, CLOS objects and structs. Nested and circular things are allowed. Nested and circular things are allowed. You can store basically anything except compiled functions, closures, class objects, packages and streams. Functions can be stored as uncompiled lambda expressions. (Compiled functions and other kinds of objects may eventually get supported too.)
Elephant needs to use a representation of data that is independant of a specific lisp or data store. Therefore all lisp values that are stored must be serialized into a canonical format. Because Berkeley DB supports variable length binary buffers, Elephant uses a binary serialization system. This process has some important consequences that it is very important to understand:
(setq foo (cons nil nil))
=> (NIL)
(add-to-root "my key" foo)
=> (NIL)
(add-to-root "my other key" foo)
=> (NIL)
(eq (get-from-root "my key")
(get-from-root "my other key"))
=> NIL
(setf (car foo) T)
=> T
(get-from-root "my key")
=> (NIL)
This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use BTrees (see Persistent BTrees).
This may seem terribly restrictive, but don't despair, we'll solve most of these problems in the next section.....
The Common Lisp Object System and the Metaobject Protocol, gives us the tools to solve these problems for objects:
(defclass my-persistent-class ()
((slot1 :accessor slot1)
(slot2 :accessor slot2))
(:metaclass persistent-metaclass))
(setq foo (make-instance 'my-persistent-class))
=> #<MY-PERSISTENT-CLASS {492F4F85}>
(add-to-root "foo" foo)
=> NIL
(add-to-root "bar" foo)
=> NIL
(eq (get-from-root "foo")
(get-from-root "bar"))
=> T
What's going on here? Persistent classes, that is, classes which use
the persistent-metaclass metaclass, are given unique IDs
(accessable through ele::oid). They are serialized simply by
their OID and class. Slot values are stored separately (and invisible
to the user) keyed by OID and slot. Loading (deserializing) a
persistent class
(get-from-root "foo")
=> #<MY-PERSISTENT-CLASS {492F4F85}>
instantiates the object or finds it in a memory cache if it already exists. (The cache is a weak hash-table, so gets flushed on GCs if no other references to the persistent object are kept in memory). The slot values are NOT loaded until you ask for them. In fact, the persisted slots don't have space allocated for them in the instances, because we're reading from the database.
(setf (slot1 foo) "one")
=> "one"
(setf (slot2 foo) "two")
=> "two"
(slot1 foo)
=> "one"
(slot2 foo)
=> "two"
Changes made to them propogate automatically:
(setf (slot1 foo) "three")
=> "three"
(slot1 (get-from-root "bar"))
=> "three"
You can also create persistent classes using the convenience macro
defpclass.
(defpclass my-persistent-class ()
((slot1 :accessor slot1)
(slot2 :accessor slot2)))
Although it is hard to see here, serialization / deserialization of persistent classes is fast, much faster than ordinary CLOS objects. Finally, they do not suffer from merge-conflicts when accessed within a transaction (see below). In short: persistent classes solve the problems associated with storing ordinary CLOS objects. We'll see later that BTrees solve the problems associated with storing hash-tables.
Using the persistent-metaclass metaclass declares all slots to
be persistent by default. To make a non-persistent slot use the
:transient t flag. Class slots :allocation :class are
never persisted, for either persistent or ordinary classes. (Someday,
if we choose to store class objects, this policy may change).
Persistent classes may inherit from other classes. Slots inherited from persistent classes remain persistent. Transient slots and slots inherited from ordinary classes remain transient. Ordinary classes cannot inherit from persistent classes – otherwise persistent slots could not be stored!
(defclass stdclass1 ()
((slot1 :initarg :slot1 :accessor slot1)))
(defclass stdclass2 (stdclass1)
((slot2 :initarg :slot2 :accessor slot2)))
(defpclass pclass1 (stdclass2)
((slot1 :initarg :slot1 :accessor slot1)
(slot3 :initarg :slot3 :accessor slot3)))
(make-instance 'pclass1 :slot1 1 :slot2 2 :slot3 3)
=> #<PCLASS1 {x10deb88a}>
(add-to-root 'pinst *)
=> #<PCLASS1 {x10deb88a}>
(slot1 pinst)
=> 1
(slot2 pinst)
=> 2
(slot3 pinst)
=> 3
Now we can simulate a new lisp session by flushing the instance cache, reloading our object then see what slots remain. Here persistent slot1 should shadow the standard slot1 and thus be persistent. Slot3 is persistent by default and slot2, since it is inherited from a standard class should be transient.
(elephant::flush-instance-cache *store-controller*)
=> #<EQL hash-table with weak values, 0 entries {x11198a02}>
(setf pinst (get-from-root 'pinst))
=> #<PCLASS1 {x1119b652}>
(slot1 pinst)
=> 1
(slot-boundp pinst slot2 pinst)
=> nil
(slot3 pinst)
=> 3
Using persistent objects has implications for the performance of your system. Note that the database is read every time you access a slot. This is a feature, not a bug, especially in concurrent situations: you want the most recent commits by other threads, right? This can be used as a weak form of IPC. But also note that in particular, if your slot value is not an immediate value or persistent object, reading will cons or freshly allocate storage for the value.
Gets are not an expensive operation; you can perform thousands to tens of thousands of primitive reads per second. However, if you're concerned, cache large values in memory and avoid writing them back to disk as long as you can.
The remaining problem outlined in the section on Serialization
is that operations which mutate collection types do not have
persistent side effects. We have solved this problem for objects, but
not for collections such as as arrays, hashes or lists. Elephant
provides two solutions to this problem: the pset and
btree classes. Each provides persistent addition, deletion and
mutation of elements, but the pset is a simple data structure that may
be more efficient in memory and time than the more general btree.
The persistent set maintains a persistent, unordered collection of
objects. They inherit all the important properties of persistent
objects: identity and fast serialization. They also resolve the
mutated substructure and nested aggregates problem for collections.
Every mutating write to a pset is an independent and persistent
operation and you can serialize or deserialize a pset without
serializing any of it's key-value pairs.
The pset is also a very convenient data structure for enabling
a persistent slot contain a collection that can be updated without
deserializing and/or reserializing a list, array or hash table on
every access.
Let's explore this data structure through a (very) simple social networking example.
(defpclass person ()
((name :accessor person-name :initarg :name))
((friends :accessor person-friends :initarg :friends)))
Our goal here is to store a list of friends that each person has, this simple graph structure enables analyses such as who are the friends of my friends, or do I know someone who knows X or what person has the minimum degree of separation from everyone else?
Without psets, we would have to do something like this:
(defmethod add-friend ((me person) (them person))
(let ((friends (person-friends me)))
(pushnew them friends)
(setf (person-friends me) friends)))
(defmethod remove-friend ((me person) (them person))
(let ((remaining-friends (delete them (person-friends me))))
(setf (person-friends me) remaining-friends)))
(defmethod map-friends (fn (me person))
(mapc fn (person-friends me)))
Ouch! This results in a large amount of consing. We have to
deserialize and generate a freshly consed list every time we call
person-friends and then reserialize and discard it on every
call to (setf person-friends).
Instead, we can simply use a pset as the value of friends and
implement the add and remove friend operations as follows:
(defpclass person ()
((name :accessor person-name :initarg :name))
((friends :accessor person-friends :initarg :friends
:initform (make-pset))))
(defmethod add-friend ((me person) (them person))
(insert-item them (person-friends me)))
(defmethod remove-friend ((me person) (them person))
(remove-item them (person-friends me)))
(defmethod map-friends (fn (me person))
(map-pset fn (person-friends me)))
If you want a list to be returned when the user calls person-friends themselves, you can simply rejigger things like this:
(defpclass person ()
((name :accessor person-name :initarg :name))
((friends :accessor person-friends-set :initarg :friends
:initform (make-pset))))
(defmethod person-friends ((me person))
(pset-list (person-friends-set me)))
If you just change the person-friends calls in our prior functions,
the new set of functions removes (setf person-friends), which
doesn't make sense for a collection slot, allows users to get a list
of the friends for easy list manipulations and avoids all the consing
that plagued our earlier version.
You can use a pset in any way you like just like a persistent
object. The only difference is the api used to manipulate it.
Instead of slot accessors, we use insert, remove, map and find.
There is one drawback to persistent sets and that is that they are not
garbage collected. Over time, orphaned sets will eat up alot of disk
space. Therefore you need to explicitly free the space or resort to
more frequent uses of the migrate procedure to compact your database.
The pset supports the drop-pset
However, given that persistent objects have the same explicit storage property, using psets to create collection slots is a nice match.
BTrees are collections of key-value pairs ordered by key with a log(N) random access time and a rich iteration mechanism. Like persistent sets, they solve all the collection problems of the prior sections. Every key-value pair is stored independently in Elephant just like persistent object slots.
The primary interface to btree objects is through
get-value. You use setf get-value to store
key-value pairs. This interface is very similar to gethash.
The following example creates a btree called
*friends-birthdays* and adds it to the root so we can retrieve
it during a later sessions. We then will add two key-value pairs
consisting of the name of a friend and a universal time encoding their
birthday.
(defvar *friends-birthdays* (make-btree))
=> *FRIENDS-BIRTHDAYS*
(add-to-root 'friends-birthdays *friends-birthdays*)
=> #<BTREE {4951CF6D}>
(setf (get-value "Ben" *friends-birthdays*)
(encode-universal-time 0 0 0 14 4 1973))
=> 2312600400
(setf (get-value "Andrew" *friends-birthdays*)
(encode-universal-time 0 0 0 22 12 1976))
=> 2429071200
(get-value "Andrew" *friends-birthdays*)
=> 2429071200
=> T
(decode-universal-time *)
=> 0
0
0
22
12
1976
2
NIL
6
In addition to the hash-table like interface, btree stores
pairs sorted by the lisp value of the key, lowest to highest. This is
works well for numbers, strings, symbols and persistent objects, but
due to serialization semantics may be strange for other values like
arrays, lists, standard-objects, etc.
Because elements are sorted by value, we can iterate over all the elements of the BTree in order. Notice that we entered the data in reverse alphabetic order, but will read it out in alphabetical order.
(map-btree (lambda (k v)
(format t "name: ~A utime: ~A~%" k
(subseq (multiple-value-list
(decode-universal-time v)) 3 6)))
*friends-birthdays*)
"Andrew"
"Ben"
=> NIL
But what if we want to read out our friends from oldest to youngest?
One way is to employ another btree that maps birthdays to names, but
this requires multiple get-value calls for each update,
increasing the burden on the programmer. Elephant provides several
better ways to do this.
The next section Indexing Persistent Classes shows you how to order and retrieve persistent classes by one or more slot values.
Class indexing simplifies the storing and retrieval of persistent objects. An indexed class stores every instance of the class that is created, ensuring that every object is automatically persisted between sessions.
(defpclass friend ()
((name :accessor name :initarg :name)
(birthday :initarg :birthday))
(:index t))
=> #<PERSISTENT-METACLASS FRIEND>
(defmethod print-object ((f friend) stream)
(format stream "#<~A>" (name f)))
(defun encode-date (dmy)
(apply #'encode-universal-time
(append '(0 0 0) dmy)))
(defmethod (setf birthday) (dmy (f friend))
(setf (slot-value f 'birthday)
(encode-date dmy))
dmy)
(defun decode-date (utime)
(subseq (multiple-value-list (decode-universal-time utime)) 3 6))
(defmethod birthday ((f friend))
(decode-date (slot-value f 'birthday)))
Notice the class argument “:index t”. This tells Elephant to store a reference to this class. Under the covers, there are a set of btrees that keep track of classes, but we won't need to worry about that as all the functionality has been nicely packaged for you.
We also created our own birthday accessor for convenience so it
accepts and returns birthdays in a list consisting of month, day and
year such as (27 3 1972). The index key will be the encoded
universal time, however.
Now we can easily manipulate all the instances of a class.
(defun print-friend (friend)
(format t " name: ~A birthdate: ~A~%"
(name friend) (birthday friend)))
(make-instance 'friend :name "Carlos"
:birthday (encode-date '(1 1 1972)))
(make-instance 'friend :name "Adriana"
:birthday (encode-date '(24 4 1980)))
(make-instance 'friend :name "Zaid"
:birthday (encode-date '(14 8 1976)))
(get-instances-by-class 'friends)
=> (#<Carlos> #<Adriana> #<Zaid>)
(mapcar #'print-friend *)
name: Carlos birthdate: (1 1 1972)
name: Adriana birthdate: (24 4 1980)
name: Zaid birthdate: (14 8 1976)
=> (#<Carlos> #<Adriana> #<Zaid>)
But what if we have thousands of friends? Aside from never getting work done, our get-instances-by-class will be doing a great deal of consing, eating up lots of memory and wasting our time. Fortunately there is a more efficient way of dealing with all the instances of a class.
(map-class #'print-friend 'friend)
name: Carlos birthdate: (1 1 1972)
name: Adriana birthdate: (24 4 1980)
name: Zaid birthdate: (14 8 1976)
=> NIL
map-class has the advantage that it does not keep references to
objects after they are processed. The garbage collector can come
along, clear references from the weak instance cache so that your
working set is finite. The list version above conses all objects into
memory before you can do anything with them. The deserialization
costs are very low in both cases.
Notice that the order in which the records are printed are not sorted according to either name or birthdate. Elephant makes no guarantee about the ordering of class elements, so you cannot depend on the insertion ordering shown here.
So what if we want ordered elements? How do we access our friends according to name and birthdate? This is where slot indices come into play.
(defpclass friend ()
((name :accessor name :initarg :name :index t)
(birthday :initarg :birthday :index t)))
Notice the :index argument to the slots and that we dropped the class :index argument. Specifying that a slot is indexed automatically registers the class as indexed. While slot indices increase the cost of writes and disk storage, each entry is only slightly larger than the size of the slot value. Numbers, small strings and symbols are good candidate types for indexed slots, but any value may be used, even different types. Once a slot is indexed, we can use the index to retrieve objects by slot values.
get-instances-by-value will retrieve all instances that are
equal to the value argument.
(get-instances-by-value 'friends 'name "Carlos")
=> (#<Carlos>)
But more interestingly, we can retrieve objects for a range of values.
(get-instances-by-range 'friends 'name "Adam" "Devin")
=> (#<Adriana> #<Carlos>)
(get-instances-by-range 'friend 'birthday
(encode-date '(1 1 1974))
(encode-date '(31 12 1984)))
=> (#<Zaid> #<Adriana>)
(mapc #'print-friend *)
name: Zaid birthdate: (14 8 1976)
name: Adriana birthdate: (24 4 1980)
=> (#<Zaid> #<Adriana>)
To retrieve all instances of a class in the order of the index instead
of the arbitrary order returned by get-instances-by-class you
can use nil in the place of the start and end values to indicate the
first or last element. (Note: to retrieve instances null values, use
get-instances-by-value with nil as the argument).
(get-instances-by-range 'friend 'name nil "Sandra")
=> (#<Adriana> #<Carlos>)
(get-instances-by-range 'friend 'name nil nil)
=> (#<Adriana> #<Carlos> #<Zaid>)
There are also functions for mapping over instances of a slot index. To map over duplicate values, use the :value keyword argument. To map by range, use the :start and :end arguments.
(map-inverted-index #'print-friend 'friend 'name :value "Carlos")
name: Carlos birthdate: (1 1 1972)
=> NIL
(map-inverted-index #'print-friend 'friend 'name
:start "Adam" :end "Devin")
name: Adriana birthdate: (24 4 1980)
name: Carlos birthdate: (1 1 1972)
=> NIL
(map-inverted-index #'print-friend 'friend 'birthday
:start (encode-date '(1 1 1974))
:end (encode-date '(31 12 1984)))
name: Zaid birthdate: (14 8 1976)
name: Adriana birthdate: (24 4 1980)
=> NIL
(map-inverted-index #'print-friend 'friend 'birthday
:start nil
:end (encode-date '(10 10 1978)))
name: Carlos birthdate: (1 1 1972)
name: Zaid birthdate: (14 8 1976)
=> NIL
(map-inverted-index #'print-friend 'friend 'birthday
:start (encode-date '(10 10 1975))
:end nil)
name: Zaid birthdate: (14 8 1976)
name: Adriana birthdate: (24 4 1980)
=> NIL
The User Guide contains a descriptions of the advanced features of Class Indices such as “derived indicies” that allow you to order classes according to an arbitrary function, a dynamic API for adding and removing slots and how to set a policy for resolving conflicts between the code image and a database where the indexing specification differs.
This same facility is also available for your own use. For more information see BTree Indexing.
One of the most important features of a database is that operations enforce the ACID properties: Atomic, Consistent, Isolated, and Durable. In plainspeak, this means that a set of changes is made all at once, that the database is never partially updated, that each set of changes happens sequentially and that a change, once made, is not lost.
Elephant provides this protection for all primitive operations. For example, when you write a value to an indexed slot, the update to the persistent slot record as well as the slot index is protected by a transaction that performs all the updates atomically and thus enforcing consistency.
Most real applications will need to use explicit transactions rather than relying on the primitives alone because you will want multiple read-modify-update operations act as an atomic unit. A good example for this is a banking system. If a thread is going to modify a balance, we don't want another thread modifying it in the middle of the operation or one of the modifications may be lost.
(defvar *accounts* (make-btree))
(defun add-account (account)
(setf (get-value account *account*)
(defun balance (account)
(get-value account *accounts*))
(defun (setf balance) (amount account)
(setf (get-value account *accounts*) amount))
(defun deposit (account amount)
"This shows a read and a write function call to
get then set the balance"
(let ((balance (balance account)))
(setf (balance account)
(+ balance amount))))
(defun withdraw (account amount)
"A nice concise lisp version for withdraw"
(decf (balance account) amount))
(add-account 'me)
=> 0
(deposit 'me 100)
=> 100
(balance 'me)
=> 100
(withdraw 'me 25)
=> 75
(balance 'me)
=> 75
This simple bank example has a significant vulnerability. If two threads read the same balance and one writes a new balance followed by the other, the second balance was written without access to the balance provided by the first and so the first transaction is lost.
The way to avoid this is to group a set of operations together, such
as the read and write in deposit and withdraw. We
accomplish this by establishing a dynamic context called a
transaction.
During a transaction, all changes are cached until the transaction is committed. The changes made by a committed transaction happens all at once. Transactions can also be aborted due to errors that happen while they are active or because of contention. Contention is when another thread writes to a variable that the current transaction is reading. As in the bank example above, if one transaction writes the balance after the current one has read it, then the current one should start over so it has an accurate balance to work with. A transaction aborted due to contention is usually restarted until it has failed too many times.
The simplest and best way to use transactions in Elephant is to simply
wrap all the operations in the with-transaction macro. Any
statements in the body of the macro are executed within the same
transaction. Thus we would modify our example above as follows:
(defun deposit (account amount)
(with-transaction ()
(let ((balance (balance account)))
(setf (balance account)
(+ balance amount)))))
(defun withdraw (account amount)
(with-transaction ()
(decf (balance account) amount)))
And presto, we have an ACID compliant, thread-safe, persistent banking system!
with-transactionWhat is with-transaction really doing for us? It first starts
a new transaction, attempts to execute the body, and commits the
transaction if successful. If anytime during the dynamic extent of
this process there is a conflict with another thread's transaction, an
error, or other non-local transfer of control, the transaction is
aborted. If it was aborted due to contention or deadlock, it attempts
to retry the transaction a fixed number of times by re-executing the
whole body.
And this brings us to two important constraints on transaction bodies: no dynamic nesting and idempotent side-effects.
In general, you want to avoid nested uses of with-transaction
statements over multiple functions. Nested transactions are valid for
some data stores (namely Berkeley DB), but typically only a single
transaction can be active at a time. The purpose of a nested
transaction in data stores that support them is to break a long
transaction into subsets. This way if there is contention on a given
subset of variables, only the inner transaction is restarted while the
larger transaction can continue. When the inner transaction commits
its results, those results become part of the outer transaction but
are not written to disk until the outer transaction commits.
If you have transaction protected primitive operations (such as
deposit and withdraw) and you want to perform a group of
such transactions, for example a transfer between accounts, you can
use the macro ensure-transaction instead of with-transaction.
(defun deposit (account amount)
"Wrap the balance read and the setf with the new balance"
(ensure-transaction ()
(let ((balance (balance account)))
(setf (balance account)
(+ balance amount)))))
(defun deposit (account amount)
"A more concise version with decf doing both read and write"
(ensure-transaction ()
(decf (balance account) amount)))
(defun withdraw (account amount)
(ensure-transaction ()
(decf (balance account) amount)))
(defun transfer (src dst amount)
"There are four primitive read/write operations
grouped together in this transaction"
(with-transaction ()
(withdraw src amount)
(deposit dst amount)))
ensure-transaction is exactly like with-transaction
except it will reuse an existing transaction, if there is one, or
create a new one. There is no harm, in fact, in using this macro all
the time.
Notice the use of decf and incf above. The primary
reason to use Lisp is that it is good at hiding complexity using
shorthand constructs just like this. This also means it is also going
to be good at hiding data dependencies that should be captured in a
transaction!
Within the body of a with-transaction, any non database operations need to be idempotent. That is the side effects of the body must be the same no matter how many times the body is executed. This is done automatically for side effects on the database, but not for side effects like pushing a value on a lisp list, or creating a new standard object.
(defparameter *transient-objects* nil)
(defun load-transients (n)
"This is the wrong way!"
(with-transaction ()
(loop for i from 0 upto n do
(push (get-from-root i) *transient-objects*))))
In this contrived example we are pulling a set of standard objects from the database using an integer key and pushing them onto a list for later use. However, if there is a conflict where some other process writes a key-value pair to a matching key, the whole transaction will abort and the loop will be run again. In a heavily contended system you might see results like the following.
(defun test-list ()
(setf *transient-objects* nil)
(load-transients)
(length *transient-objects*))
(test-list 3)
=> 3
(test-list 3)
=> 5
(test-list 3)
=> 4
So the solution is to make sure that the operation on the lisp parameters is atomic if the transaction completes.
(defun load-transients (n)
"This is a better way"
(setq *transient-objects*
(with-transaction ()
(loop for i from 0 upto n collect
(get-from-root i)))))
(Of course we would need to use nreverse if we cared about the
order of instances in *transient-objects*)
The best rule-of-thumb is to ensure that transaction bodies are purely functional as above, except for side effects to persistent objects and btrees.
If you really do need to execute side-effects into lisp memory, such as writes to transient slots, make sure they are idempotent and that other processes cannot read the written values until the transaction completes.
By now transactions almost look like more work than they are worth! Fortunately, there are also performance benefits to explicit use of transactions. Transactions gather together all the writes that are supposed to made to the database and store them in memory until the transaction commits, and only then writes them to the disk.
The most time-intensive component of a transaction is waiting while flushing newly written data to disk. Using the default auto-committing behavior requires a disk flush for every primitive write operation. This is very, very expensive! Because all the values read or written are cached in memory until the transaction completes, the number of flushes can be dramatically reduced.
But don't take my word for it, run the following statements and see for yourself the visceral impact transactions can have on system performance.
(defpclass test ()
((slot1 :accessor slot1 :initarg :slot1)))
(time (loop for i from 0 upto 100 do
(make-instance 'test :slot1 i)))
This can take a long time, well over a minute on the CLSQL data store. Here each new objects that is created has to independantly write its value to disk and accept a disk flush cost.
(time (with-transaction ()
(loop for i from 0 upto 100 do
(make-instance 'test :slot1 i))))
Wrapping this operation in a transaction dramatically increases the time from 10's of seconds to a second or less.
(time (with-transaction ()
(loop for i from 0 upto 1000 do
(make-instance 'test :slot1 i))))
When we increase the number of objects within the transaction, the time cost does not go up linearly. This is because the total time to write a hundred simple objects is still dominated by the disk writes.
These are huge differences in performance! However we cannot have infinitely sized transactions due to the finite size of the data store's memory cache. Large operations (such as loading data into a database) need to be split into a sequential set of smaller transactions. When dealing with persistent objects a good rule of thumb is to keep the number of objects touched in a transaction well under 1000.
Designing and tuning a transactional architecture can become quite complex. Moreover, bugs in your system can be very difficult to find as they only show up when transactions are interleaved within a larger, multi-threaded application.
In many cases you can simply ignore transactions. For example, when you don't have any other concurrent processes running. In this case all operations are sequential and there is no chance of conflicts. You would only want to use transactions to improve performance on repeated sets of operations.
You can also ignore transactions if your application can guarantee that concurrency won't generate any conflicts. For example, a web app that guarantees only one thread will write to objects in a particular session can avoid transactions altogether. However, it is good to be careful about making these assumptions. In the above example, a reporting function that iterates over sessions, users or other objects may still see partial updates (i.e. a user's id was written prior to the query, but not the name). However, if you don't care about these infrequent glitches, this case would still hold.
If these cases don't apply to your application, or you aren't sure,
you will fare best by programming defensively. Break your system into
the smallest logical sets of primitive operations
(i.e. withdraw and deposit) using
ensure-transaction and then wrap the highest level calls made
to your system in with-transaction when the operations absolutely have
to commit together or you need the extra performance. Try not to have
more than two levels of transactional accesses with the top using
with-transaction and the bottom using ensure-transaction.
See Transaction Details for more details and Design Patterns for examples of how systems can be designed and tuned using transactions.
The tutorial covers the essential topics and concepts for using Elephant. Many people will find that these features are the ones that are most often needed and used in ordinary applications.
More sophisticated uses of Elephant may require additional features that are covered in the user guide. The following is a list of major features in the user guide that were not covered in this tutorial.
indexed-btree. This allows for multiple ordering and groupings
of the values of a BTree.
with-transaction using the
underlying controller methods for starting, aborting and committing
transactions. You had better know what you are doing, however!
Further, see Design Patterns for information about Elephant design patterns, solutions to common problems and other scenarios with multiple possible solutions.
Elephant is a multi-platform, multi-lisp and multi-backend system. As such there is a great deal of complexity in testing. The system has tried to minimize external dependencies as much as possible to ease installation, but it still requires some patience and care to bring Elephant up on any given platform. This section attempts to simplify this for new users as much as possible. Patches and suggestions will be gladly accepted.
Elephant supports SBCL, Allegro, Lispworks, OpenMCL and CMUCL. Each lisp is supported on each of the platforms it runs on: Mac OS X, Linux and Windows. As of release 0.6.1, both 32-bit and 64-bit systems should be supported.
Due to the small number of developers and the large number of configurations providing full test coverage is problematic. There are:
which means that the total number of combinations to be tested could be as much as:
lisps * os * radix * dstore = 5 * 3 * 2 * 3 = 90 configurations
Not all of these combinations are valid, but the implication is that not every combination will be tested in any given release. The developers and user base regularly use the following platforms
The CLSQL backend is used predominantly under SBCL on Linux and Mac OS X at the time of writing. The developers will do their best to accomodate users who are keen to test other combinations, but the above configurations will be the most stable and reliable.
Elephant is now quite stable in general, so don't be afraid to try an unemphasized combination - chances are it is just a little more work to bring it up. In particular, Elephant can probably work with MySQL or Oracle with just a little work, but nobody has asked for this yet.
The Elephant core system requires:
Follow the instructions at these URLs to download and setup the libraries. (Note: uffi and cl-base64 are asdf-installable for those of you with asdf-install on your system). Elephant, however, is not asdf-installable today.
In addition to these libraries, each data store has their own dependencies as discussed in Berkeley DB and CL-SQL.
Before you can load the elephant packages into your running lisp, you need to setup the configuration file. Copy the reference file config.sexp from the root directory to my-config.sexp in the root directory. my-config.sexp contains a lisp reader-formatted list of key-value pairs that tells elephant where to find various libraries and how to build them.
For example:
#+(and (or sbcl allegro) macosx)
((:berkeley-db-include-dir . "/opt/local/include/db45/")
(:berkeley-db-lib-dir . "/opt/local/lib/db45/")
(:berkeley-db-lib . "/opt/local/lib/db45/libdb-4.5.dylib")
(:berkeley-db-deadlock . "/opt/local/bin/db45_deadlock")
(:compiler . :gcc))
The following is a guide to the various parameters. For simplicity, we include all the parameters here, although we will go into more detail in each of the data store sections.
The config.sexp file contains a set of example configurations to start from, but you will most likely need to modify it for your system.
Elephant has one small C library that it uses for binary serialization. This means that you need to have gcc in your path (see Elephant on Windows for exceptions on the Windows platform).
Now that you have loaded all the dependencies and created your configuration file you can load the Elephant packages and definitions:
(asdf:operate 'asdf:load-op :elephant)
This will load the cl-base64 and uffi libraries. It will also automatically compile and load the C library. The build process no longer depends on a Makefile and has been verified on most platforms, but if you have a problem please report it, and any output you can capture, to the developers at elephant-devel@common-lisp.net. We will update the FAQ at http://trac.common-lisp.net/elephant with common problems users run into.
Elephant uses a two-phase load process. The core code is loaded and
the code for a given data store is loaded on demand when you call
open-store with a specification referencing that data store.
The second phase of the load process requires ASDF to be installed on
your system.
(NOTE: There are some good reasons and not so good reasons for this process. One reason you cannot load ele-bdb.asd directly as it depends on lisp code defined in elephant.asd. We decided not to fix this in the 0.9 release although later releases may improve on this).
Now that Elephant has been loaded, you can call use-package in
the cl-user package,
CL-USER> (use-package :elephant)
=> T
use a predefined user package,
CL-USER> (in-package :elephant-user)
=> T
ELE-USER>
or import the symbols into your own project package from :elephant.
(defpackage :my-project
(:use :common-lisp :elephant))
The imported symbols are all that is needed to control Elephant databases and are documented in detail in User API Reference
As discussed in the tutoral, you need to open a store to begin using Elephant:
(open-store '(:BDB "/Users/owner/db/my-bdb/"))
...
ASDF loading messages
...
=> #<BDB-STORE-CONTROLLER>
(open-store '(:CLSQL (:POSTGRESQL "localhost.localdomain"
"mydb" "myuser" ""))))
...
ASDF loading messages
...
=> #<SQL-STORE-CONTROLLER>
The first time you load a specific data store, Elephant will call ASDF
to load all the specified data store's dependencies, connect to a
database and return the store-controller subclass instance for
that data store.
The Berkeley DB Data Store started out as a very simple data dictionary in the Berkeley Unix operating system. There are many “Xdb” systems that use the same API, or a similarly one. A free for non-commercial use version of Berkeley DB is provided by Oracle corporation with commercial licenses available. Please follow the download and installation procedures defined here:
http://www.oracle.com/technology/products/berkeley-db/db/index.html
Elephant only works with version 4.5 of BerkeleyDB.
We recommend that you download and build a distribution from Oracle. Some problems have been reported with linking to Debian, Cygwin or other packages. This is especially true for Windows users.
Beyond ensuring that the file “my-config.sexp” points to your BDB installation directories and files, nothing else should be required to configure the example that uses a local “testdb” directory as a dabase (under “tests”) in the top-level Elephant directory.
On one Fedora based system, the “my-config.sexp” file looked like this:
((:berkeley-db-include-dir . "/usr/local/BerkeleyDB.4.5/include")
(:berkeley-db-lib-dir . "/usr/local/BerkeleyDB.4.5/lib")
(:berkeley-db-lib . "/usr/local/BerkeleyDB.4.5/lib/libdb.so")
(:berkeley-db-deadlock . "/usr/local/BerkeleyDB.4.5/bin/db_deadlock")
(:pthread-lib . nil)
(:clsql-lib . "/usr/local/share/common-lisp/")
(:compiler . :gcc))
The Test Suites give a nice example of using BDB by running the test using the specification:
'(:BDB "<elephant-root>/tests/testdb/")
Once you start working on an application, you will want to change the
path to a directory that is appropriate for your application, and use
that as the specification passed to open-store on application
startup.
When there is a new release of Elephant, it will depend on a new version of Berkeley DB. If so, you must upgrade your BDB databases to use the new version Elephant. This forced upgrade is a consequence of Elephant not parsing the BDB header files which tend to change various important constants with each release. These patches are usually minor. Upgrading also happens because Elephant tries to leverage new features of Berkeley DB.
The rest of this section talks about how to upgrade your existing Berkeley DB databases, opening them in the new Elephant version and migrating them to a newly created Elephant database.
This section outlines how to upgrade from Elephant version 0.6.0 and Berkeley DB 4.3.
(setf sc (open-store '(:BDB "/Users/me/db/ele060/")))
(upgrade sc '(:BDB "/Users/me/db/ele090/"))
(NOTE: close-store may fail when closing the old 0.6 database, this is OK.)
(NOTE: 64-bit lisps will not successfully upgrade 32-bit 0.6 databases. Use a 32-bit version of your lisp to update to 0.9 and then open that database in your 64-bit lisp. There should be no compatibility problems. Best to test your application on a 32-bit lisp if you can, just to be sure.)
Follow the upgrade procedures outlined in the Elephant 0.6.0 INSTALL file to upgrade your database from 0.5 to 0.6.0. Then follow the above procedures for upgrading to 0.9.
(NOTE: It may not take much work to make 0.9 upgrade directly from 0.5 However there are so few (none?) 0.5 users that it wasn't deemed worth the work given that there's an upgrade path available.)
Although originally designed as an interface to the BerkeleyDB system, the original Elephant system has been extended to support the use of relational database management systems as the implementation of the persistent store. This relies on Kevin Rosenberg's CL-SQL interface, which provides access to a large number of relational systems.
A major motivation of this extension is that one one might prefer the licensing of a different system. For example, at the time of this writing, it is our interpretation that one cannot use the BerkeleyDB system behind a public website http://www.sleepycat.com/download/licensinginfo.shtml#redistribute unless one releases the entire web application as open source.
Neither the PostGres DBMS nor SQLite 3, nor Elephant itself, imposes any such restriction.
Other reasons to use a relational database system might include: familiarity with those systems, the fact that some part of your application needs to use the truly relational aspects of those systems, preference for the tools associated with those systems, etc.
Elephant provides functions for migrating data seamlessly between data stores. One can quite easily move data from a BerkeleyDB repository to a PostGres repository, and vice versa. This offers at least the possibility than one can develop using one data store, for example BerkeleyDB, and then later move to Postgres. One could even operate simultaneously out of multiple repositories, if there were a good reason to do so.
The SQL implementation shares the serializer with the BDB data store, but base64 encodes the resulting binary stream. This data is placed into a single table in the SQL data store.
All functionality except for nested transaction support and cursor-puts supported by the BerkeleyDB data store is supported by the CL-SQL data store. CL-SQL transaction integrity under concurrent operation has not been extensively stress tested.
Additionally, it is NOT the case that the Elephant system currently provides transaction support across multiple repositories; it provides transaction support on a per-repository basis.
The PostGres backend is currently about 5 times slower than the BerkeleyDB backend. As of the time of this writing, only PostGres and SqlLite 3 have been tested as CL-SQL backends.
To set up a PostGres based back end, you should:
(defvar *testpg-path*
'(:postgreql "localhost.localdomain" "test" "postgres" ""))
which means that connections must be allowed to the database test, user “postgres”, no password, connected from the same machine “localhost.localdomain”. (This would be changed to something more secure in a real application.) Typically you edit the file : pg_hba.conf to enable various kinds of connections in postgres.
psql -h 127.0.0.1 -U postgres test before you attempt to connect with Elephant.
Furthermore, you must grant practically all creation/read/write privileges to the user postgres on this schema, so that it can construct the tables it needs.
Upon first opening a CL-SQL based store controller, the tables, indexes, sequences, and so on needed by the Elephant system will be created in the schema named “test” automatically.
The build process on Windows currently only works with GCC under Cygwin. The process can be a bit tricky, so if it doesn't work out of the box or you don't want to install cygwin, we recommend that you download the DLLs from the Elephant website download page (http://www.common-lisp.net/project/elephant/downloads.html').
Unpack the .zip file into the elephant root directory. Ensure that
your my-config.sexp file configuration for Windows has
:prebuilt-binaries set to “t” so it will know to look in
the elephant root during the asdf loading process.
For Berkeley DB users we recommend downloading the Windows binary distribution of Berkeley DB 4.5 to minimize any potential linking issues.
Elephant has matured quite a bit over the past year or two. Hopefully, it will work out-of-the-box for you.
However, if you are using an LISP implementation different than the ones on which it is developed and maintained (see Requirements) or you have a problem that you think may be a bug, you may want to run the test suites. If you report a bug, we will ask you to run these tests and report the output. Running them when you first install the system may give you a sense of confidence and understanding that makes it worth the trouble.
There are three files that execute the tests. You should choose one as a starting point based on what backend(s) you are using. If using BerekeleyDB, use
BerkeleyDB-tests.lisp
If using both, use both of the above and also use:
MigrationTests.lisp
The text of this file is included here to give the casual reader an idea of how elepant test can be run in general:
;; If you are only using one back-end, you may prefer:
;; SQLDB-test.lisp or BerkeleyDB-tests.lisp
(asdf:operate 'asdf:load-op :elephant)
(asdf:operate 'asdf:load-op :ele-clsql)
(asdf:operate 'asdf:load-op :ele-bdb)
(asdf:operate 'asdf:load-op :ele-sqlite3)
(asdf:operate 'asdf:load-op :elephant-tests)
(in-package "ELEPHANT-TESTS")
;; Test Postgres backend
(setq *default-spec* *testpg-spec*)
(do-backend-tests)
;; Test BDB backend
(setq *default-spec* *testbdb-spec*)
(do-backend-tests)
;; Test SQLite 3
(setq *default-spec* *testsqlite3-spec*)
(do-backend-tests)
;; Test a Migration of data from BDB to postgres
(do-migration-tests *testbdb-spec* *testpg-spec*)
;; An example usage.
(open-store *testpg-spec*)
(add-to-root "x1" "y1")
(get-from-root "x1")
(add-to-root "x2" '(a 4 "spud"))
(get-from-root "x2")
The appropriate test should execute for you with no errors.
If you get errors, you may wish to report it the
elephant-devel at common-lisp.net email list.
Setting up SQLite3 is even easier. Install SQLite3 (I had to use the source rather than the binary install, in order to get the dynamic libraries constructed.)
An example use of SQLLite3 would be:
(asdf:operate 'asdf:load-op :elephant)
(asdf:operate 'asdf:load-op :ele-clsql)
(asdf:operate 'asdf:load-op :ele-sqlite3)
(in-package "ELEPHANT-TESTS")
(setq *test-path-primary* '(:sqlite3 "testdb"))
(do-all-tests-spec *test-path-primary*)
The file RUNTESTS.lisp, although possibly not exactly what you want, contains useful example code.
You can of course migrate between the three currently supported repository strategies in any combination: BDB, Postgresql, and SQLite3.
In all probability, other relational datbases would be very easy to support but have not yet been tested. The basic pattern of the “path” specifiers is (cons clsqal-database-type-symbol (normal-clsql-connection-specifier)).
If you are getting the documentation as a released tar file, you will probably find the documenation in .html or .pdf form in the release, or can find it at the Elephant website.
If you want to compile the documentation youself, for example, if you can think of a way to improve this manual, then you will do something similar to this in a shell or command-line prompt:
cd doc
make
make pdf
This process will populate the “./includes” directory with references automatically extracted from the list code. Currently this docstring extraction process relies on SBCL, but with minor modifications the scripts should work with other lisp environemnts.
The Makefile will then compile the texinfo documentation source into an HTML file and a PDF file which will be left in the “doc/” directory. An info style HTML tree is also created in the “doc/elephant” directory. This tree contains one node per HTML file.
Don't edit anything in the “doc/elephant” directory or the “doc/includes” directories, as everything in these directories is generated. Instead, edit the “.texinfo” files in the doc directory.
An instance of the store-controller class mediates interactions
between Lisp and a data store. All elephant operations are performed
in the context of a store controller. To be more specific, a data
store provides a subclass of store-controller specialized to
that data store. Typically this object contains pointers to the disk
files, foreign memory regions and any other necessary bookkeeping
information to support Elephant operations such as slot writes and
btree operations. The store also contains the root objects and other
bookeeping common to all data stores.
To obtain a store-controller object, call the function
open-store with a store controller specification. The current
data store specification formats are:
Valid CLSQL database tags for <sql-db-name> are
:SQLITE and :POSTGRESQL. The <sql-connect-command> is
what you would pass to CLSQL's connect command.
The open store function uses the first symbol in the specification
(i.e. :BDB or :CLSQL) to dispatch instance creation to the specified
data store which returns a specialized instance of
store-controller. open-store then initializes the store
using an internal call to open-controller.
The final step of open-store is to set the global variable
*store-controller*. This special variable is used as a default
value in the optional or keyword arguments to number of operations
such as:
make-instance for persistent objects
get-from-root and add-to-root for accessing a store's root
make-btree for creating persistent index instances
Each of these functions also accepts an explicit store controller
argument for use in multiple store environments. Normal applications
should only be aware that this global parameter is used. For further
discussion of *store-controller* see Multi-repository Operation.
Additionally, open-store accepts data store specific keyword
arguments. For example, you can force recovery to be run on Berkeley
DB data stores:
(open-store *my-spec* :recover t)
The data store sections of the user guide (Berkeley DB Data Store and CLSQL Data Store) list all the data-store specific options to various elephant functions.
When you finish your application, close-store will close the
store controller. Failing to do this properly may lead to a need to
run recovery on the data store during the next session. Again, see
the relevant data store sections for more detail.
There are consequences to trying to move values from lisp memory onto disk in order to persist them. The first consequence is that that pointers cannot be guaranteed to be valid and so references to lisp objects cannot be maintained. This is very similar to the problems with passing references in foreign function interfaces. The second, and more frustrating limitation is that lisp operations that commit side effects on aggregate objects, such as objects, arrays, etc, cannot be trapped and replicated on the disk representation. This leads up to a very important consequence: all lisp objects are stored by value. This policy has a number of consequences which are detailed below.
(setq foo (cons nil nil))
=> (NIL)
(add-to-root "my key" foo)
=> (NIL)
(add-to-root "my other key" foo)
=> (NIL)
(eq (get-from-root "my key")
(get-from-root "my other key"))
=> NIL
(setf (car foo) T)
=> T
(get-from-root "my key")
=> (NIL)
This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use Persistent Sets (see Persistent Sets) or BTrees (see Persistent BTrees).
char-code) and not
strict unicode codes so lisps may not be able to interoperably read
characters unless they have identical character code maps for the
character sets you are reading and writing. All standard ASCII
strings should be portable. Here is what we know about specific
lisps, but this should not be taken as gospel.
char-code
produces proper Unicode codes
char-code produces proper Unicode codes for codes < 2^16
Atomic types have no recursive substructure. That is they cannot contain arbitrary objects and are of a bounded size. (Bignums are an exception, but they have a predictable structure and cannot reference or otherwise encapsulate other objects). The following is a list of atoms and a discussion of how they are serialized.
nil:
nil has it's own special tag in the serializer so it is easily
identifiable. nil is an awkward value as it is also a boolean.
The boolean value t is stored as the symbol 'T.
small-float is
not equivalent to type single-float as it is on all other
supported platforms. Written to disk and deserialized as a single
float so any memory footprint savings of small-float is lost.
char-code. The size of
the first character dictates the word width used for encoding. If a
character violates the word width, the string encoding is aborted and
the next larger width is chosen. The rationale here is that many
strings consist of Latin characters with codes less than 256. Strings
stored in other character sets tend to all be of codes > 256.
Therefore it is likely that the first character will properly
determine the word size of the string. (On request, we can easily make
a configuration option to fix the word width for encoding)
namestring of the path object
stored as a string. The path object is reconstructed from the
namestring using parse-namestring during deserialization.
The next list are aggregate types, meaning that elements of
that type can contain references to elements of type T. That
means, in theory, that storing an aggregate type to disk that refers
to other objects can copy every reachable object! This is a direct
and dire consequence of the “store-by-value” restriction.
(see Persistent Classes and Objects for how to design around the
store-by-value restriction).
This list describes how aggregates are handled by the serializer.
(* (/ size reshash-threshold) rehash-size).
struct-constructor method so that a new, empty instance of the
structure can be created and then populated by the stored keys and
values.
One final strategic consideration is to whether you plan on sharing the binary database between machines or between different lisp platforms on the same machine. This is almost possible today, but there are some restrictions. In the section Repository Migration and Upgrade we will discuss possible ways of migrating an existing database across platforms and lisps.
Persistent classes are instances of the persistent-metaclass
metaclass. All persistent classes keep track of which slots are
:persistent, :transient and/or :indexed and are
used as specializers in the persistence meta-object protocols
(initialization of slots, slot-access, etc).
All persistent classes create objects that inherit from the
persistent class. The persistent class provides two
slots that contain a unique object identifier (oid) and a reference to
the store-controller specification they are associated with.
Persistent slots do not take up any storage space in memory, instead
the persistent-metaclass slot access protocol redirects slot
accesses into calls to the store controller. Typically, the
underlying data store will then perform the necessary serialization,
deserialization to read and write data to disk.
When a reference to a persistent instance itself is written to
the database, for example as a key or value in a btree, only
the unique ID and class of the instance is stored. When read, a
persistent object instance is re-created (see below). This means that
serialization of persistent objects is exceedingly cheap compared to
standard objects. The subsection on instance creation below will
discuss the lifecycle of a persistent object in more detail.
To create persistent classes, the user needs to specify the
persistent-metaclass to the class initarg :metaclass.
(defclass my-pclass ()
((slot1 :accessor slot1 :initarg :slot1 :initform 1))
(:metaclass persistent-metaclass))
The only differences between the syntax of standard and persistent
class definitions is the ability to specify a slot storage policy and
an index policy. Slot value storage policies are specified by a
boolean argument to the slot initargs :persistent,
:transient and :indexed. Slots are :persistent
and not :indexed by default.
The defpclass macro is provided as a convenience to hide the
:metaclass slot option.
(defpclass my-pclass ()
((pslot1 :accessor pslot1 :initarg :pslot1 :initform 'one)
(pslot2 :accessor pslot2 :initarg :pslot2 :initform 'two
:persistent t)
(tslot1 :accessor tslot1 :initarg :tslot1 :initform 'three
:transient t)))
In the definition above the class my-pclass is an instance of
the metaclass persistent-metaclass. According to this
definition pslot1 and pslot2 are persistent while
tslot1 is transient and stored in memory.
Slot storage class implications are straightforward. Persistent slot writes are durably stored to disk and reads are made from disk and can be part of a ACID compliant transaction . Transient slots are initialized on instance creation according to initforms or initargs. Transient slot values are never stored to nor loaded from the database and their accesses cannot be protected by transactions. (Ordinary multi-process synchronization would be required instead).
The :index option tells Elephant whether to maintain an
inverted index that maps slot values to their parent objects. The
behavior of indexed classes and class slots are discussed in depth in
Class Indices.
Persistent classes have their metaobject protocols modified through
specializations on persistent-metaclass. These specializations
include the creation of special slot metaobjects:
transient-slot-definition, persistent-slot-definition
and direct and effective versions of each. For the MOP aficionado the
highlights of the new class initialization protocols are as follows:
shared-initialize :around ensures that this class inherits from
persistent-object and persistent if it doesn't
already and that the class option :index results in class indexes
being indexed;.
direct-slot-initialization-class returns the appropriate slot
metaobject based on the values of the :transient and :persistent
slot definition keywords. It also does some simple error checking for invalid
combinations, for example, indexed transient slots.
effective-slot-definition-class performs the same role as the above for
effective slots.
slot-definition-allocation returns the :database allocation for
persistent slot definitions so the underlying lisp will not allocate instance or
class storage under some lisps.
compute-effective-slot-definition-initargs performs some error checking
to ensure a subclass does not try to make an inherited persistent slot transient.
finalize-inheritance called before the first instance is created in order
to finalize the list of persistent slots to account for any
forward referenced classes in the inheritence list. Similarly the
list of indexed slots is computed. This function is also called by the class indexing
code if any calls are made that depend on knowing which slots are indexed.
Reinitialization is discussed in the section on class redefinition.
Persistent objects are created just like standard objects, with a call
to make-instance. Initforms and slot initargs behave as the
user expects. The call to make-instance of a persistent class
will fail unless there is a default store-controller instance
in the variable *store-controller* or the :sc keyword
argument is provided a valid store controller object. The store
controller is required to provide a unique object id, initialize the
specification pointer of the instance and to store the values of any
initialized slots. The initialization process is as follows:
initialize-instance :before is called to initialize the
oid slot and the data store specification slot dbcn-spc-pst.
The oid is set by the argument :from-oid or by calling the store
controller for a new oid.
shared-initialize :around is called to ensure that the underlying
lisp does not bypass the metaobject protocol during slot
initialization by manually initializing the persistent slots
and passing the transient slots to the underlying lisp.
Finally it adds the instance to the class index so that any inverted indicies
are updated appropriately.
Persistent slots are initialized only under the following conditions:
make-instance
After initialization the persistent instance is added to its host store controller's object cache. This cache is a weak hash table that maps oids to object instances. So after initialization the following state has been created:
make-instance.
:from-oid argument.
If you mnanually create an object using an OID which already exists in
the database, initargs to make-instance take precedence
over existing values in the database, which in turn take precedence
over any initforms defined in the class.
The distributed nat