Next: , Previous: The Store Controller, Up: User Guide


4.2 Serialization details

There are consequences to trying to move values from lisp memory onto disk in order to persist them. The first consequence is that that pointers cannot be guaranteed to be valid and so references to lisp objects cannot be maintained. This is very similar to the problems with passing references in foreign function interfaces. The second, and more frustrating limitation is that lisp operations that commit side effects on aggregate objects, such as objects, arrays, etc, cannot be trapped and replicated on the disk representation. This leads up to a very important consequence: all lisp objects are stored by value. This policy has a number of consequences which are detailed below.

4.2.1 Restrictions of Store-by-Value

  1. Lisp identity can't be preserved. Since this is a store which persists across invocations of Lisp, this probably doesn't even make sense. However if you get an object from the index, store it to a lisp variable, then get it again - they will not be eq:
              (setq foo (cons nil nil))
              => (NIL)
              (add-to-root "my key" foo)
              => (NIL)
              (add-to-root "my other key" foo)
              => (NIL)
              (eq (get-from-root "my key")
                    (get-from-root "my other key"))
              => NIL
         
  2. Nested aggregates are serialized recursively into a single buffer. If you store an set of objects in a hash table you try to store a hash table, all of those objects will get stored in one large binary buffer with the hash keys. This is true for all aggregates that can store type T (cons, array, standard object, etc).
  3. Circular References. One benefit provided by the serializer is that the recursive serialization process does not lead to infinite loops when they encounter circular references among aggregate types. It accomplishes this by assigning an ID to any non-atomic object and keeping a mapping between previously serialized objects and these ids. This same mapping is used to reconstruct references in lisp memory on deserialization such that the original structure is properly reproduced.
  4. Storage limitations. The serializer writes sequentially into a contiguous foreign byte array before passing that array to a given data store's API. There are practical limits to the size of the foreign buffer that lisp can allocate (usually somewhere on the order of 10-100MB due to address space fragmentation). Moreoever, most data stores will have a practical limit to the size of a transaction or the size of key or value they will store. Either of these considerations should encourage you to plan to limit the size of objects that you serialize to disk. A good rule of thumb is to stay under a handful of megabytes. We have successfully serialized arrays over 100MB in the past, but have not tested the robustness of these large values over time.
  5. Mutated substructure does not persist.
              (setf (car foo) T)
              => T
              (get-from-root "my key")
              => (NIL)
         

    This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use Persistent Sets (see Persistent Sets) or BTrees (see Persistent BTrees).

  6. Serialization and deserialization can be costly. While serialization is pretty fast, but it is still expensive to store large objects wholesale. Also, since object identity is impossible to maintain, deserialization must re-cons or re-allocate the entire object every time increasing the number of GCs the system does. This eager allocation is contrary to how most people want to use a database: one of the reasons to use a database is if your objects can't fit into main memory all at once.
  7. Merge-conflicts in heavily multi-process/threaded situations. This is the common read-modify-write problem in all databases. We will talk more about this in the Transaction Details section.
  8. Byte Ordering. The primitive elements such as integers are written to disk in the native byte ordering of the machine on which the lisp runs. This means that little endian machines cannot read values written by big endian machines and vice a versa.
  9. Unicode codes and Serialized Strings. The characters and strings stored to disk can store and recover lisp character codes that implement unicode, but the character maps are the lisp character maps (produced by char-code) and not strict unicode codes so lisps may not be able to interoperably read characters unless they have identical character code maps for the character sets you are reading and writing. All standard ASCII strings should be portable. Here is what we know about specific lisps, but this should not be taken as gospel.

4.2.2 Atomic Types

Atomic types have no recursive substructure. That is they cannot contain arbitrary objects and are of a bounded size. (Bignums are an exception, but they have a predictable structure and cannot reference or otherwise encapsulate other objects). The following is a list of atoms and a discussion of how they are serialized.

4.2.3 Aggregate Types

The next list are aggregate types, meaning that elements of that type can contain references to elements of type T. That means, in theory, that storing an aggregate type to disk that refers to other objects can copy every reachable object! This is a direct and dire consequence of the “store-by-value” restriction. (see Persistent Classes and Objects for how to design around the store-by-value restriction).

This list describes how aggregates are handled by the serializer.

One final strategic consideration is to whether you plan on sharing the binary database between machines or between different lisp platforms on the same machine. This is almost possible today, but there are some restrictions. In the section Repository Migration and Upgrade we will discuss possible ways of migrating an existing database across platforms and lisps.