Using Transactions - Elephant User Manual

Next: Advanced Topics, Previous: Indexing Persistent Classes, Up: Tutorial

2.9 Using Transactions

One of the most important features of a database is that operations enforce the ACID properties: Atomic, Consistent, Isolated, and Durable. In plainspeak, this means that a set of changes is made all at once, that the database is never partially updated, that each set of changes happens sequentially and that a change, once made, is not lost.

Elephant provides this protection for all primitive operations. For example, when you write a value to an indexed slot, the update to the persistent slot record as well as the slot index is protected by a transaction that performs all the updates atomically and thus enforcing consistency.

2.9.1 Why do we need Transactions?

Most real applications will need to use explicit transactions rather than relying on the primitives alone because you will want multiple read-modify-update operations act as an atomic unit. A good example for this is a banking system. If a thread is going to modify a balance, we don't want another thread modifying it in the middle of the operation or one of the modifications may be lost.

     (defvar *accounts* (make-btree))
     
     (defun add-account (account)
       (setf (get-value account *account*)
     
     (defun balance (account)
       (get-value account *accounts*))
     
     (defun (setf balance) (amount account)
       (setf (get-value account *accounts*) amount))
     
     (defun deposit (account amount)
       "This shows a read and a write function call to
        get then set the balance"
       (let ((balance (balance account)))
         (setf (balance account)
               (+ balance amount))))
     
     (defun withdraw (account amount)
       "A nice concise lisp version for withdraw"
       (decf (balance account) amount))
     
     (add-account 'me)
     => 0
     (deposit 'me 100)
     => 100
     (balance 'me)
     => 100
     (withdraw 'me 25)
     => 75
     (balance 'me)
     => 75

This simple bank example has a significant vulnerability. If two threads read the same balance and one writes a new balance followed by the other, the second balance was written without access to the balance provided by the first and so the first transaction is lost.

The way to avoid this is to group a set of operations together, such as the read and write in deposit and withdraw. We accomplish this by establishing a dynamic context called a transaction.

During a transaction, all changes are cached until the transaction is committed. The changes made by a committed transaction happens all at once. Transactions can also be aborted due to errors that happen while they are active or because of contention. Contention is when another thread writes to a variable that the current transaction is reading. As in the bank example above, if one transaction writes the balance after the current one has read it, then the current one should start over so it has an accurate balance to work with. A transaction aborted due to contention is usually restarted until it has failed too many times.

The simplest and best way to use transactions in Elephant is to simply wrap all the operations in the with-transaction macro. Any statements in the body of the macro are executed within the same transaction. Thus we would modify our example above as follows:

     (defun deposit (account amount)
       (with-transaction ()
         (let ((balance (balance account)))
           (setf (balance account)
                 (+ balance amount)))))
     
     (defun withdraw (account amount)
       (with-transaction ()
         (decf (balance account) amount)))

And presto, we have an ACID compliant, thread-safe, persistent banking system!

2.9.2 Using `with-transaction`

What is with-transaction really doing for us? It first starts a new transaction, attempts to execute the body, and commits the transaction if successful. If anytime during the dynamic extent of this process there is a conflict with another thread's transaction, an error, or other non-local transfer of control, the transaction is aborted. If it was aborted due to contention or deadlock, it attempts to retry the transaction a fixed number of times by re-executing the whole body.

And this brings us to two important constraints on transaction bodies: no dynamic nesting and idempotent side-effects.

2.9.3 Nesting Transactions

In general, you want to avoid nested uses of with-transaction statements over multiple functions. Nested transactions are valid for some data stores (namely Berkeley DB), but typically only a single transaction can be active at a time. The purpose of a nested transaction in data stores that support them is to break a long transaction into subsets. This way if there is contention on a given subset of variables, only the inner transaction is restarted while the larger transaction can continue. When the inner transaction commits its results, those results become part of the outer transaction but are not written to disk until the outer transaction commits.

If you have transaction protected primitive operations (such as deposit and withdraw) and you want to perform a group of such transactions, for example a transfer between accounts, you can use the macro ensure-transaction instead of with-transaction.

     (defun deposit (account amount)
       "Wrap the balance read and the setf with the new balance"
       (ensure-transaction ()
         (let ((balance (balance account)))
           (setf (balance account)
                 (+ balance amount)))))
     
     (defun deposit (account amount)
       "A more concise version with decf doing both read and write"
       (ensure-transaction ()
         (decf (balance account) amount)))
     
     (defun withdraw (account amount)
       (ensure-transaction ()
         (decf (balance account) amount)))
     
     (defun transfer (src dst amount)
       "There are four primitive read/write operations
        grouped together in this transaction"
       (with-transaction ()
         (withdraw src amount)
         (deposit dst amount)))

ensure-transaction is exactly like with-transaction except it will reuse an existing transaction, if there is one, or create a new one. There is no harm, in fact, in using this macro all the time.

Notice the use of decf and incf above. The primary reason to use Lisp is that it is good at hiding complexity using shorthand constructs just like this. This also means it is also going to be good at hiding data dependencies that should be captured in a transaction!

2.9.4 Idempotent Side Effects

Within the body of a with-transaction, any non database operations need to be idempotent. That is the side effects of the body must be the same no matter how many times the body is executed. This is done automatically for side effects on the database, but not for side effects like pushing a value on a lisp list, or creating a new standard object.

     (defparameter *transient-objects* nil)
     
     (defun load-transients (n)
        "This is the wrong way!"
        (with-transaction ()
           (loop for i from 0 upto n do
              (push (get-from-root i) *transient-objects*))))

In this contrived example we are pulling a set of standard objects from the database using an integer key and pushing them onto a list for later use. However, if there is a conflict where some other process writes a key-value pair to a matching key, the whole transaction will abort and the loop will be run again. In a heavily contended system you might see results like the following.

     (defun test-list ()
        (setf *transient-objects* nil)
        (load-transients)
        (length *transient-objects*))
     
     (test-list 3)
     => 3
     
     (test-list 3)
     => 5
     
     (test-list 3)
     => 4

So the solution is to make sure that the operation on the lisp parameters is atomic if the transaction completes.

     (defun load-transients (n)
       "This is a better way"
       (setq *transient-objects*
             (with-transaction ()
                 (loop for i from 0 upto n collect
                       (get-from-root i)))))

(Of course we would need to use nreverse if we cared about the order of instances in *transient-objects*)

The best rule-of-thumb is to ensure that transaction bodies are purely functional as above, except for side effects to persistent objects and btrees.

If you really do need to execute side-effects into lisp memory, such as writes to transient slots, make sure they are idempotent and that other processes cannot read the written values until the transaction completes.

2.9.5 Transactions and Performance

By now transactions almost look like more work than they are worth! Fortunately, there are also performance benefits to explicit use of transactions. Transactions gather together all the writes that are supposed to made to the database and store them in memory until the transaction commits, and only then writes them to the disk.

The most time-intensive component of a transaction is waiting while flushing newly written data to disk. Using the default auto-committing behavior requires a disk flush for every primitive write operation. This is very, very expensive! Because all the values read or written are cached in memory until the transaction completes, the number of flushes can be dramatically reduced.

But don't take my word for it, run the following statements and see for yourself the visceral impact transactions can have on system performance.

     (defpclass test ()
       ((slot1 :accessor slot1 :initarg :slot1)))
     
     (time (loop for i from 0 upto 100 do
              (make-instance 'test :slot1 i)))

This can take a long time, well over a minute on the CLSQL data store. Here each new objects that is created has to independantly write its value to disk and accept a disk flush cost.

     (time (with-transaction ()
              (loop for i from 0 upto 100 do
                 (make-instance 'test :slot1 i))))

Wrapping this operation in a transaction dramatically increases the time from 10's of seconds to a second or less.

     (time (with-transaction ()
              (loop for i from 0 upto 1000 do
                 (make-instance 'test :slot1 i))))

When we increase the number of objects within the transaction, the time cost does not go up linearly. This is because the total time to write a hundred simple objects is still dominated by the disk writes.

These are huge differences in performance! However we cannot have infinitely sized transactions due to the finite size of the data store's memory cache. Large operations (such as loading data into a database) need to be split into a sequential set of smaller transactions. When dealing with persistent objects a good rule of thumb is to keep the number of objects touched in a transaction well under 1000.

2.9.6 Transactions and Applications

Designing and tuning a transactional architecture can become quite complex. Moreover, bugs in your system can be very difficult to find as they only show up when transactions are interleaved within a larger, multi-threaded application.

In many cases you can simply ignore transactions. For example, when you don't have any other concurrent processes running. In this case all operations are sequential and there is no chance of conflicts. You would only want to use transactions to improve performance on repeated sets of operations.

You can also ignore transactions if your application can guarantee that concurrency won't generate any conflicts. For example, a web app that guarantees only one thread will write to objects in a particular session can avoid transactions altogether. However, it is good to be careful about making these assumptions. In the above example, a reporting function that iterates over sessions, users or other objects may still see partial updates (i.e. a user's id was written prior to the query, but not the name). However, if you don't care about these infrequent glitches, this case would still hold.

If these cases don't apply to your application, or you aren't sure, you will fare best by programming defensively. Break your system into the smallest logical sets of primitive operations (i.e. withdraw and deposit) using ensure-transaction and then wrap the highest level calls made to your system in with-transaction when the operations absolutely have to commit together or you need the extra performance. Try not to have more than two levels of transactional accesses with the top using with-transaction and the bottom using ensure-transaction.

See Transaction Details for more details and Design Patterns for examples of how systems can be designed and tuned using transactions.