The DBM "hash table" system has long been used in Unix applications. It provides a way of managing data in the form of associative arrays in a reasonably efficient manner.
Commonly available variations include NDBM, the "New" DBM implementation, SDBM, ODBM, and GDBM, the "GNU" implementation. These implementations have provided a simple but effective API resembling the following:
DBM *dbm_open(char *, int, int);
void dbm_close(DBM *);
datum dbm_fetch(DBM *, datum);
datum dbm_firstkey(DBM *);
datum dbm_nextkey(DBM *);
int dbm_delete(DBM *, datum);
int dbm_store(DBM *, datum, datum, int);
The varying implementations differ somewhat in terms of licensing, ability to handle large quantities of data, and efficiency. Many scripting languages provide ways of tying these sorts of hash tables to associative arrays; the language that popularized this was Perl.
The Perl AnyDBM_File documentation page describes some of the similarities and differences between different implementations.
Most modern Unix-like systems include one or more DBM implementations.
Modern enhancements include the following:
The "classic" Berkeley DB system offers more robust network-aware functionality than other 'DBM' systems, including the ability to have ACID transactions that may commit or roll back, as well as having the ability to store data in a B-Tree, thereby making it useful to search ranges within a table. It even has an interface to the XA distributed transaction API. It is definitely not "Grandpa's old DBM;" even this list of neat extensions is incomplete.
Arguably unfortunately, along with this comes a licensing scheme that is, if anything, more aggressive in demanding that applications be " free " than the GPL , when it might seem preferable for it to use the LGPL . (Quite surprising, with the nearly "public domain" treatment of so many other Berkeley developments.)
See also The Sleepycat Software Home Page, which is the "commercial" home of Berkeley DB.
Interfaces are available to many languages, including C , C++ , Java , Perl , Python , Tcl, and even Common Lisp , with db.lisp.
A database in which to store "constant" values. Once values are set, they cannot be updated.
This provides fast access to static data and is quite well suited to use at providing fast access to configuration data for things like mail routing. From the author of qmail.
I've done some testing and found that one can very quickly load/dump large quantities of data to/from CDB databases, certainly vastly faster than with other hashing schemes like GDBM, so it appears that the claims are supported by reality.
I'd like to use CDB in conjunction with the relatively static data used with ifile, one of these days...
bun - bundle many files together - based on cdb format.
bun represents something of an alternative to TAR or cpio or zip. It uses the well-understood database format, cdb, which makes it a straighforward matter to extend it to provide additional functionality.
For those unhappy with Dan Bernstein's licenses on CDB, someone has reimplemented it as tinycdb, released in the public domain.
CMUbik is a library for mantaining a synchronized Berkeley DataBase over multiple hosts. It has good failure semantics and strong authentication provided by the Cyrus SASL library. Its API is very similar to that of Berkeley DB allowing existing applications to port to use it very easily.
A DBM-like database system that supports concurrent writes.
The Kumofs Project - distributed key-value store atop Tokyo Cabinet
leveldb - a fast and lightweight key/value database library
leveldb is apparently used in some of Google's products, such as Google Chrome. It is a single-process, single-user, nondistributed system.
PostgreSQL represents a curious and perhaps-unexpected option...
One can readily create a DBM-like schema via the table definition:
create table dbm_table ( key text primary key, value text ); |
Rather curiously, if one turns off some of the reliability
guarantees via shutting off fsync()
-based transaction
synchronization, there have been benchmarks finding PostgreSQL SQL to
be faster than many of the DBM-ish implementations.