CouchDb update

I've now got the "computed tables" feature of the CouchDb working. It's got a way to go, but I've now got the hard parts of the design done and working.

Computed tables are like SQL tables, in that they consist of rows and columns of data. Unlike SQL tables, you don't add, update or delete the rows explicitly, instead the contents of the table are computed based on the document objects in the database. When documents are added or modified in the database, the computed table updates values to reflect the changes. Each table is specified with a Fabric formula (similar in concept to a SQL query), and each computed row corresponds to one document in the database for which the formula "selects" the document, and the column values are based on the formula's computation and transformations of the documents contents.[If you don't understand any of that, don't worry, I suck at explaining it]

These computed tables are indexed and stored on disk for rapid incremental updates. The tables are implemented as "table groups" which is one or more tables that are computed simultaneously. To view one table's contents will trigger a incrementally refresh the of all the tables in the group. This is a bit of an optimization that allows more efficient building of related tables and the back indexes.

I still have much work to do, for instance I don't yet have it actually hooked up to the Fabric formula engine yet, instead I'm using Erlang expressions to build the queries, but I've got all the hardest stuff actually working.

One cool thing that will be possible is to get back partial table build results. If the table refresh is big, instead of making the user wait until it's completely finished building, you can instead return just the rows that have been computed, and the refresh will continue to happen in the background. The user can refresh their view and they'll get progressively more up to date table results until its completely done.

My next really hard task is going to be the incremental hot compaction, but I'm not going to tackle that for some time. I'll probably release some alpha level code before then.

One direction I'm going to explore is to expose the CouchDb data store primarily through a HTTP/REST API. Erlang has a very nice lightweight and scalable HTTP server (YAWS) that will make it easy to do just that. The only thing I don't know is how easy or useful it will be to program against a REST API when building web apps or the implications for an offline mode, but its very Web 2.0 and so many tools are moving in that direction. We'll see.

Posted April 12, 2006 9:00 PM