Berkeley DB Data Store - Elephant User Manual

Next: CLSQL Data Store, Previous: Garbage Collection, Up: User Guide

4.17 Berkeley DB Data Store

This section briefly describes special facilities of the Berkeley DB data store and explains how persistent objects map onto it. Elephant was originally written targeting only Berkeley DB. As such, the design of Elephant was heavily influenced by the Berkeley DB architecture.

Berkeley DB is a C library that very efficiently implements a database by allowing the application to directly manipulate the memory pools and disk storage without requiring communication through a server as in many relational database applications. The library supports multi-threaded and multi-process transactions through a shared memory region that provides for shared buffer pools, shared locks, etc. Each process in a multi-process application is independently linked to the library, but shares the memory pool and disk storage.

The following subsections discuss places where Berkeley DB provides additional facilities to the Elephant interfaces described above.

4.17.1 Architecture Overview

The Berkeley DB data store (indicated by a :BDB in the data store specification) supports the Elephant protocols using Berkeley DB as a backend. The primary features of the BDB library that are used are BTree databases, the transactional subsystem, a shared buffer pool and unique ID sequences.

All data written to the data store ends up in a BTree slot using a transaction. There are two databases, one for persistent slot values and one for btrees. The mapping of Elephant objects is quite simple.

Persistent slots are written to a btree using a unique key and the serialized value being written. The key is the oid of the persistent object concatenated to the serialized name of the slot being written. This ordering groups slots together on the disk

4.17.2 Opening a Store

When opening a store there are several special options you can invoke:

:recover tells Berkeley DB to run recovery on the underlying database. This is reasonably cheap if you do not need to run recovery, but can take a very long time if you let your log files get too long. This option must be run in a single-threaded mode before other threads or processes are accessing the same database.
:recover-fatal runs Berkeley DB catastrophic recovery (see BDB documentation).
:thread set this to nil if you want to run single threaded, it avoids locking overhead on the environment. The default is to run free-threaded.
The :deadlock-detect launches a background process via the run-shell commands of lisp. This background process connects to a Berkeley DB database and runs a regular check for deadlock, freeing locks as appropriate when it finds them. This can avoid a set of annoying crashes in Berkeley DB, the very crashes that, in part, motivated Franz to abandon AllegroStore and write the pure-Lisp AllegroCache.

4.17.3 Starting a Transaction

Berkeley DB transactions have a number of additional keyword parameters that can help you tune performance or change the semantics in Berkeley DB applications. They are summaried briefly here, see the BDB docs for detailed information:

:degree-2 This option provides for cursor stability, that is whatever object the cursor is currently at will not change, however prior values read may change. This can significantly enhance performance if you frequently map over a btree as it doesn't lock the entire btree, just the current element. All transactions running concurrently over the btree can commit without restarting. The global parameter *map-using-degree2* determines the default behavior of this option. It is set to true by default so that map has similar semantics to lists. This violates both Atomicity and Consistency depending on how it is used.
:read-uncommitted Allows reading data that has been written by other transactions, this avoids the current thread blocking on a read access (for example you are merely dumping a btree for inspection) so long as you don't care whether the data you read changes or not. This violates Atomicity and Consistency depending on how it is used
:txn-nosync Do not flush the log when this transaction completes. This means that you lose the Durability of a transaction, but gain performance by avoiding the expensive sync operation.
:txn-nowait If a lock is unavailable, have the underlying database return a deadlock message immediately, rather than blocking, so that the transaction restarts.
:txn-sync This is the default behavior and specifies that the transaction log of the current transaction is flushed to disk before the transaction commit routine returns. This provides full ACID compliance.
:transaction This argument is for advanced use. It tells the Berkeley DB transaction subsystem the transaction it should use rather than to create a new one. The :parent argument provides a parent transaction that can result in a true nested transaction.

4.17.4 Special Commands

The berkeley DB data store exports some special facilities that are not currently supported by other data stores.

optimize-layout. This function provides an interface to tell Berkeley DB to try to reclaim freed storage from the file system. This is of limited utility as it can only shrink database by the number of empty pages at the end of the file. Depending on what storage you have deleted, this can end up being only a handful or even zero pages. This will work well if you recently ran an experiment where you created a bunch of new data, then deleted it all and want to reclaim the space (i.e. you had runaway loop that was creating endless objects).
db-bdb:checkpoint. This internal function forces the transaction log to be flushed and all active data to be written to the database so that the logs and database are in synch. This is good to run when you want to delete old log files and backup your database files as a coherent, recoverable set. Run checkpoing, close the database and then manually run “db_archive -d” on the database to remove old logs. Finally, copy the resulting data to stable storage. Read the Berkeley DB docs for more details of backing up and checkpointing.

4.17.5 Performance Tuning

Performance tuning for Berkeley DB is a complex topic and we will not cover it here. You need to understand the Berkeley DB data store architecture, the transaction architecture, the serializer and other such parameters. The primary performance related parameters are described in config.sexp. They are:

:berkeley-db-map-degree2 - Improve the efficiency of cursor traversals in the various mapping functions. Defaults to true, meaning a value you just read while mapping may change before the traversal is done. So if you operate only on the current cursor location, you are guaranteed that it's value is stable.
:berkeley-db-cachesize - Change the size of the buffer cache for Berkeley DB to match your working set. Default is 10MB, or about twenty thousand indexed class objects, or 50k standard persistent objects. You can save memory by reducing this value.