The software similarity tester SIM
This set of software was designed to look for "excessive similarities" between programs, and includes "tuning information" to allow comparisons of programs written in C, LISP, Scheme, and possibly Pascal.
Useful for educational institutions that wish to detect plagiarism on programming assignments.
ifile collects statistics on the occurrances of words in mail documents that have been filed/refiled, and uses that to determine a "best guess" of where new mail should best be filed. I have done quite a lot of work to tune it to provide decent performance.
Mifluz is a Full Text indexing library that uses directory structures to represent its "trees." Plans are to have 4 levels of indexing, in increasing order of expected efficiency: