Christopher B. Browne's Home Page
cbbrowne@acm.org

Text/Document Databases

Christopher Browne


Table of Contents
1. Introduction
2. Document Management Projects for Linux
3. Search Tools
4. Text Analysis Tools
5. Web Search Engines
6. Structure Tools
7. EDMS - Electronic Document Management Systems
8. Integrated Text Database Systems - Lotus Notes

This document describes a variety of tools for manipulating documents as groups of documents. Many of these tools are available for Linux, but the integration work required to make them work together (as implemented, for instance, in Lotus Notes ) has not been done. This section thus may be more a source of ideas and possible future developments than it is of things available now.

Caution

Note that I have not been doing much maintenance of this material of late, and as a result, some material is very likely rather out of date.

1. Introduction

Most thinking about databases tends to relate to the use of tables and transactional relationships, and generally results in creating transaction oriented Relational Database Management Systems.

There is also a lot of data that would be structured quite differently, namely in the form of linked documents:

These "pieces of data" that would typically be called documents certainly have structure, but not of a sort that can be sufficiently rigidly defined as to be conveniently represented as a set of relational tables.

In the case of `legacy' documents (e.g. - documents not designed with re-accessability in mind), there may indeed be little or no structure that can be recognized and used in an automated fashion.

Many organizations don't recognize that their overall set of documents in fact represents a database that is valuable and worth managing. They only find this out when something horrible happens such as when a LAN "goes down" and destroys a large number of critical documents.

A wide variety of tools are available for structuring, managing and searching these sorts of "document databases," both in commercial and free realms.

Google
Contact me at cbbrowne@acm.org