December 18, 2007

CouchDB Performance

Seth Falcon gives CouchDB performance a nice going over in his article A quick look at CouchDB Performance. I actually tried to post a reply on the blog a couple of days ago, but I think something went wrong when I posted it. Anyway, my response is back, in blog form.

Performance-wise, things looks slightly better than I thought they would. A few things to note:

For Test 1, I created new documents in an empty database one after another in a loop.

=== Add single doc in a loop ===
|      N |  sec | Docs/sec |
|------–+------+----------|
|   1000 |    9 |      111 |
|  10000 |  102 |       98 |
| 100000 | 1075 |       93 |
The times scale linearly, but I was surprised to see a 1.3GB file size for the database in the 100K case.

I think he means the times scale constantly, not linearly, as a function of the number of documents in the database. Constant scaling is ideal, but in practice never achievable, and linear scaling is typical for unindexed, flat files. The actual update cost as a function of the number of documents is logarithmic, which is hard to see here. I think the fixed costs of the doc write, parsing and network overhead in these samples is outsizing the logarithmic degradation of the core index updates.

The large size of the file is due to the overhead of write-once internal indexes that must be partially copied with every document update. The compaction process will recover this space (the compaction code isn't done yet). In general, the amount of wasted space for a freshly compacted db should be less that 2x the raw document size (though the overhead per document is fixed + logarithmic cost, so large documents have less waste per document).

Here I was surprised both that 100K documents was enough to exhaust the system’s memory and that the result was a complete crash of the server. I wonder how hard it would be to take advantage of Erlang’s touted fault tolerant features to make the server more robust to such situations.

In this case, the high memory usage is due the lack of proper HTTP and JSON stream processing in the CouchDB HTTP server module. Everything is buffered before processing, causing the memory spikes you see. Switching to the mochiweb erlang HTTP server library will help with this.

And the fact that low memory causes the whole VM to crash is something that surprised and saddened me, and we are addressing it with a parent process to monitor and restart the VM process. I'm not happy at all that the Erlang VM is so fragile with memory allocations, I'd assumed it would terminate the virtual Erlang process requesting the memory, not the whole VM process. But I digress.

One thing I'd like to see are some concurrent client load tests, and see how many clients it supports and how performance looks with different read write loads. I think CouchDB will do very well here.

Anyway, great stuff Seth, thanks for the testing and article!

Link

December 13, 2007

Thoughts on Optimizing CouchDB

As I've stated before, no profiling work has been done on CouchDB whatsoever. It has just been quick enough that it hasn't mattered at these still early stages, so nothing has been optimized. It also means that CouchDB has lots of room for performance improvement. If it's fast enough now, when some serious work has been put into optimizations it should scream. I'd guess with a couple weeks of profiling and optimization, just finding and eliminating the hot spots will result in a 10x performance increase. I'm just guesstimating because easy-to-fix hotspots in unprofiled code usually account for ~90% of low hanging performance fruit.

CouchDB has a custom storage and view indexing engine with a simple design that is optimized first for reliability and availability, while having good update, retrieval and indexing performance. It's also designed for high concurrency, using optimistic updates that never block readers. It trades disk space to get much of this (fortunately disk space is cheap and getting cheaper all the time).

Within the design of the core server and storage engine there is opportunity for many optimizations. Things off the top of my head: caching (Erlang has a really good native Judy Array implementation ideal for an object cache EDIT: I'm apparently wrong, the Judy array's never made into core Erlang? Anyway, Erlang does has ETS and an unordered set option which is ideal for an object cache.), unordered storage, data structure compaction and string atomization, binary data compression, more efficient string collation plus all the things I don't know about being performance problems because I haven't profiled yet.

Within the front end http server layer there are more big areas for optimization. Replacing the inets httpd server in favor of something more light weight, like the mochi web server, which will allow for http and json stream processing. Also, adding more bulk query http calls, for bulk reads and multi-key view lookups (the core already supports it, it's just not exposed to HTTP). Replication will speed up tremendously when it's http calls are performed in bulk.

View indexing has tons of optimizations that can be done and can easily become a project itself. Compression, IO optimizations, static analysis of map functions, etc, etc.

So if you ask how fast will CouchDB be, I say measure what it does now and expect it to do it at least 10x faster by 1.0. Being a big project with a lot a layers, I'd say a 10x improvement from the un-optimized baseline is an easy target to hit. In few years after more really smart engineers have worked on it, 100x may be possible.

Link

December 9, 2007

Erlang VM crash

Jan writes a call for help for some Erlang and Spidermonkey problems we've been having.

It's an interesting reason that Erlang crashes. I spent some time looking at the crash dump and the server didn't look to be in a stressed state. It only had 10's of processes running, most of them part of the core server that are always running. It was only using 12 megs memory total and asking for 2 meg more. How could it fail to allocate 2 megs and then crash because of it?

Turns out it's running in shared, hosted, memory limited environment and the Spidermonkey engine was hogging all the allowed OS memory. I imagine when Spidermonkey fails the OS allocation, it runs the GC. When Erlang fails an OS allocation, the whole VM bites the dust. The fact that it bites the dusts isn't as disturbing I originally thought, but it did surprise me, I'd expected it to reset all it's internal state rather than just exit the OS process.

So the Erlang VM died suddenly with a failed memory allocation. This is actually quite okay as CouchDB is designed with crash-only design principles and will instantly restart with no fix-up cycle necessary. So in the interest of crash-only design, this really is the way to go. Crash and restart.

However, the Erlang VM didn't restart automatically. Apparently we need to configure something for that to happen, only we can't figure out how to make it work. The documentation is either wrong, confusing or the feature is buggy. This is the kind of stuff we keep hitting in Erlang. It's fantastically productive for many tasks, but using some of the built-in libraries and features can be a huge time sink. (inets and xmerl immediately spring to mind). I love Erlang, I really do, but I have a nasty Erlang rant building inside of me.

Link

December 5, 2007

CouchDB Roundup (UPDATED)

Update: Apparently I'm wrong about the Erlang file size driver issue, either it was fixed since I last checked or it was/is only broken on Windows or I was wrong all along. (Take yer pick). Jan got a live database to over 5 gigs with no issues whatsoever.

Buzz around CouchDB continues to grow. It's kinda freaking me out. Today I talked to guy a from Fortune magazine who wanted to know more about it, but I think I just end up confusing him.

Jan wrote a terrific tutorial for creating a simple Todo application with CouchDB.

Ajatus (http://www.ajatus.info/) is pure browser implementation of distributed document management system for CouchDB. I soooo want to spend more time playing with this.

About performance, because a lot of people are asking about it, I'll just say I don't know. I've not done any performance work on CouchDB. No one has that I'm aware of. It's been fast enough for most testing purposes, so we haven't profiled or optimized anything.

The biggest performance problem I'll bet is that CouchDB only caches only database headers in memory, so it actually goes all the way to the OS layers for all data on each query. Not just for document object reads, but also each btree node read in every index lookup, etc. On most systems the OS will maintain a memory cache of the database file, so disk IO itself doesn't become a bottleneck. Instead CouchDB appears to be CPU bound, because of all those unnecessary calls and copying data across layers, encoding and decoding, etc. The easy answer is high level object cache for the most common lookups, which I think will increase performance dramatically. But right now will also complicate things, as the design evolves it's best the code stay lean and nimble. So no caching for now.

Coming Soon

The next release I hope to have implemented the incremental reduce functionality. I've got the design worked out (I think) and it should work quite well and is functionally identical to Google's MapReduce, with incremental semantics. CouchDB already has an incremental map facility (the views), where the max cost of incrementally updating the views is logarithmic. Now I've figured out a fairly simple way of incrementally computing the reductions with logarithmic cost too, by extending the same view btree structures and storing intermediate reductions in the modified tree structures. Plus I get the built-in concurrency and crash-only semantics of the existing view model. Incremental reductions will open up a whole bunch of new reporting capabilities for CouchDB.

We also hope to have a new HTTP server layer. We are trying to move away from the Erlang inets httpd code base and use Mochiweb. It's appears to be a better match for CouchDB's purposes.

Want to Help?

- Security Model. CouchDB still needs a built-in security model. I have one in mind but I'm open to suggestions.

- Large File Driver. Total database file size is limited to a total file size of 2/4 gig (depending on the OS I think) because Erlang doesn't support large files in the std library. Large file support will require a special port driver to be created for Erlang. This driver might be a good start.

- Benchmark and Stress Testing. Curious about CouchDB performance, scalability and reliability? We need someone to create some test scripts and benchmarks to hammer CouchDB.

Got questions or suggestions? The mailing list is the best place to start.

Link