CouchDB Roundup (UPDATED)

Update: Apparently I'm wrong about the Erlang file size driver issue, either it was fixed since I last checked or it was/is only broken on Windows or I was wrong all along. (Take yer pick). Jan got a live database to over 5 gigs with no issues whatsoever.

Buzz around CouchDB continues to grow. It's kinda freaking me out. Today I talked to guy a from Fortune magazine who wanted to know more about it, but I think I just end up confusing him.

Jan wrote a terrific tutorial for creating a simple Todo application with CouchDB.

Ajatus (http://www.ajatus.info/) is pure browser implementation of distributed document management system for CouchDB. I soooo want to spend more time playing with this.

About performance, because a lot of people are asking about it, I'll just say I don't know. I've not done any performance work on CouchDB. No one has that I'm aware of. It's been fast enough for most testing purposes, so we haven't profiled or optimized anything.

The biggest performance problem I'll bet is that CouchDB only caches only database headers in memory, so it actually goes all the way to the OS layers for all data on each query. Not just for document object reads, but also each btree node read in every index lookup, etc. On most systems the OS will maintain a memory cache of the database file, so disk IO itself doesn't become a bottleneck. Instead CouchDB appears to be CPU bound, because of all those unnecessary calls and copying data across layers, encoding and decoding, etc. The easy answer is high level object cache for the most common lookups, which I think will increase performance dramatically. But right now will also complicate things, as the design evolves it's best the code stay lean and nimble. So no caching for now.

Coming Soon

The next release I hope to have implemented the incremental reduce functionality. I've got the design worked out (I think) and it should work quite well and is functionally identical to Google's MapReduce, with incremental semantics. CouchDB already has an incremental map facility (the views), where the max cost of incrementally updating the views is logarithmic. Now I've figured out a fairly simple way of incrementally computing the reductions with logarithmic cost too, by extending the same view btree structures and storing intermediate reductions in the modified tree structures. Plus I get the built-in concurrency and crash-only semantics of the existing view model. Incremental reductions will open up a whole bunch of new reporting capabilities for CouchDB.

We also hope to have a new HTTP server layer. We are trying to move away from the Erlang inets httpd code base and use Mochiweb. It's appears to be a better match for CouchDB's purposes.

Want to Help?

- Security Model. CouchDB still needs a built-in security model. I have one in mind but I'm open to suggestions.

- Large File Driver. Total database file size is limited to a total file size of 2/4 gig (depending on the OS I think) because Erlang doesn't support large files in the std library. Large file support will require a special port driver to be created for Erlang. This driver might be a good start.

- Benchmark and Stress Testing. Curious about CouchDB performance, scalability and reliability? We need someone to create some test scripts and benchmarks to hammer CouchDB.

Got questions or suggestions? The mailing list is the best place to start.

Posted December 5, 2007 8:57 PM