Montezuma

Montezuma might be a fast, useful text search engine library written entirely in pure Lisp.

Montezuma is a Common Lisp port of Ferret. Ferret is a Ruby port of Lucene. Lucene is sort of Doug Cutting's Java version of Text Database (TDB), which he and Jan Pedersen developed at Xerox PARC, and which, to complete the circle, was written in Common Lisp (see "An Object-Oriented Architecture for Text Retrieval").

I'm hoping Montezuma will have better performance than both Ferret and Lucene, not because I'm doing anything fancy, but because I'll be relying on native code-generating Common Lisp implementations. Currently performance approaches that of Lucene, but is far from Ferret's relatively blazing speed. See PerformanceComparisons.

Like every other open source project, Montezuma has an official sound clip, the call of the Montezuma Oropendola.

Current Status

Montezuma 0.1.1 was released on July 13, 2006.

To install it:

(asdf-install:install '#:montezuma)

Or you can download the tarball directly: montezuma-0.1.1.tar.gz

Indexing is complete, with both in-memory indices and indices on disk.

Single-term queries, boolean queries, phrase queries and wildcard queries basically work (though there are bugs). Queries can be constructed via the API as well as by parsing a simple query language.

There is at least one known bug, but it seems to be a bug in Ferret, too!

Requirements

Montezuma uses CL-PPCRE and CL-FAD, both of which are ASDF-installable.

I've tested it with OpenMCL, SBCL (version 0.9.12 or later, due to a bug in earlier versions) and ACL.

Loading, Running, Testing

Load via ASDF:

CL-USER> (asdf:oos 'asdf:load-op '#:montezuma)
; loading system definition from
; /Users/wiseman/src/montezuma/montezuma.asd into #<PACKAGE "ASDF4465">
; registering #<SYSTEM #:MONTEZUMA {106891F1}> as MONTEZUMA
; registering #<SYSTEM #:MONTEZUMA-TESTS {10A489A1}> as MONTEZUMA-TESTS
; ...

Load and run unit tests via ASDF:

CL-USER> (asdf:oos 'asdf:test-op '#:montezuma)

; compiling file "/Users/wiseman/src/montezuma/tests/unit/tests.lisp" (written 23 FEB 2006 11:37:38 AM):
; ...
;; MONTEZUMA::TEST-PRIORITY-QUEUE ................
;; MONTEZUMA::TEST-PRIORITY-QUEUE-CLEAR ..
;; MONTEZUMA::TEST-PRIORITY-QUEUE-STRESS .
;; MONTEZUMA::TEST-RAM-STORE .............................................................................
;; MONTEZUMA::TEST-FS-STORE .............................................................................
;; MONTEZUMA::TEST-STANDARD-FIELD ...........
; ...

Run individual tests (or test fixtures):

CL-USER> (montezuma::run-test-named 'montezuma::test-segment-info)
;; MONTEZUMA::TEST-SEGMENT-INFO ......

Development Strategy

Here's what I'm thinking:

1. Make an initial hack & slash port of Ferret to Lisp. This is basically a transliteration of the Ruby code to Lisp. It might be ugly, it might not be idiomatic, but the goal is to get something working quickly. Unit tests will be ported as well. Some features may be left unimplemented (e.g., thread- and process-level locking).

If we ever make it past step 1, then we can think about...

2. Clean it up. Turn it into code that Lispers will enjoy. Craft a nice external API and let it out. A comprehensive library of unit tests will be critical to avoid breakage.

3. Tune performance and finish out the feature list. Once the code reaches a point where it is both adequately useful and adequately pretty, begin the process of attacking the the last 20%, which is a task that will presumably never be complete. Add benchmark tests, fix bugs, etc. Ideally I will find someone who thinks this is fun--anddump the project on them.

Mailing list/Discussion Group

There is a Montezuma discussion group/mailing list for developers.