CouchDB Roundup (UPDATED)
Update: Apparently I'm wrong about the Erlang file size driver issue, either it was fixed since I last checked or it was/is only broken on Windows or I was wrong all along. (Take yer pick). Jan got a live database to over 5 gigs with no issues whatsoever.Buzz around CouchDB continues to grow. It's kinda freaking me out. Today I talked to guy a from Fortune magazine who wanted to know more about it, but I think I just end up confusing him.
Jan wrote a terrific tutorial for creating a simple Todo application with CouchDB.
Ajatus (http://www.ajatus.info/) is pure browser implementation of distributed document management system for CouchDB. I soooo want to spend more time playing with this.
About performance, because a lot of people are asking about it, I'll just say I don't know. I've not done any performance work on CouchDB. No one has that I'm aware of. It's been fast enough for most testing purposes, so we haven't profiled or optimized anything.
The biggest performance problem I'll bet is that CouchDB only caches only database headers in memory, so it actually goes all the way to the OS layers for all data on each query. Not just for document object reads, but also each btree node read in every index lookup, etc. On most systems the OS will maintain a memory cache of the database file, so disk IO itself doesn't become a bottleneck. Instead CouchDB appears to be CPU bound, because of all those unnecessary calls and copying data across layers, encoding and decoding, etc. The easy answer is high level object cache for the most common lookups, which I think will increase performance dramatically. But right now will also complicate things, as the design evolves it's best the code stay lean and nimble. So no caching for now.
Coming Soon
The next release I hope to have implemented the incremental reduce functionality. I've got the design worked out (I think) and it should work quite well and is functionally identical to Google's MapReduce, with incremental semantics. CouchDB already has an incremental map facility (the views), where the max cost of incrementally updating the views is logarithmic. Now I've figured out a fairly simple way of incrementally computing the reductions with logarithmic cost too, by extending the same view btree structures and storing intermediate reductions in the modified tree structures. Plus I get the built-in concurrency and crash-only semantics of the existing view model. Incremental reductions will open up a whole bunch of new reporting capabilities for CouchDB.
We also hope to have a new HTTP server layer. We are trying to move away from the Erlang inets httpd code base and use Mochiweb. It's appears to be a better match for CouchDB's purposes.
Want to Help?
- Security Model. CouchDB still needs a built-in security model. I have one in mind but I'm open to suggestions.
- Large File Driver. Total database file size is limited to a total file size of 2/4 gig (depending on the OS I think) because Erlang doesn't support large files in the std library. Large file support will require a special port driver to be created for Erlang. This driver might be a good start.
- Benchmark and Stress Testing. Curious about CouchDB performance, scalability and reliability? We need someone to create some test scripts and benchmarks to hammer CouchDB.
Got questions or suggestions? The mailing list is the best place to start.
Posted December 5, 2007 8:57 PM
Comments
Reduction support - yay!
Have you by any chance looked at the Map / Reduce / Merge algorithm ? I'm guessing after reduction goes in you'll probably spend most of your time on general performance, stability, scaling etc but this merging thing is pretty cool stuff from what I can figure out. It appears to fill in the gaps for a lot of queries people are used to from SQL-based DBs without really compromising the philosophy of a (relatively speaking) 'dumb', flat, highly scalable, highly reliable data store...
rubyruy, December 6, 2007 3:14 PM
erlang does not have a 2-4 gig limit on file access. I have a project running that has no problems doing random seeks and reads on 15-30 gb files.
you may be thinking of the limit on dets files, but why would you be using these if you needed to store large amounts of data?
Jason, December 6, 2007 5:23 PM
If you are thinking about security make sure it's granular at least to the "record" level.
Malcontent, December 7, 2007 4:55 PM
I don't care if CouchDB is slow. But give
me a DB which I can install on 5 servers,
without modifying the web app's code, and
serve 5X the traffic which I'm able to
serve with just 1 DB server, and I'll be
happy like a pig in the mud. :-)
Dan, June 13, 2009 9:01 PM
Post a comment