Blogging Couch

Ok, so I've been very remiss in that I haven't blogged anything Couch related. There is a good reason for that, I haven't been working on it very much. Why? I just wasn't in the right mindset for it, which is a wuss way of saying I didn't feel like it.

I've been doing some contract work, but mostly I've worked on lots of little personal projects. Some, like Clicky and Idiot-B-Gone, were only a day or two. Others, like my random code generator, I just I lost interest. And of course I spent a lot of time on Trigger Happy.

In case you didn't know, Couch is going to be the storage system for a large scale object database. My long term goal is a database that can serve as the back end to a very large scale RSS aggregator.

The lowest layer of the project is CouchDB. CouchDB will be the code layer that actually shuffles the bits around on disk. It will be responsible for object storage, retrieval and indexing. Think Notes NSF and NIF.

So for the past couple of weeks I've been doing more and more work on CouchDB. When I was trying to figure out how I was going to architect all the on disk structures I reached a big breakthrough....I know nothing about building a database. And I need to accept that, and just move forward anyway. Even though I've worked a lot on NSF, I think that is just confusing me. Because NSF has all this complexity that sort of grew organically. I don't mean that as a bad thing at all. It's a big part of its success. But it's also not how you'd design a database system if you're building from scratch. To put in that complexity ahead of time is asking for disaster.

So I'm reminding myself that writing code is like molding clay, not carving a sculpture. Code is plastic, it can change oh so very easily. So I need to just write the damn code! When I rewrote Compute, that's exactly what I did. I was so precocious.

So I need to push. And I'm pretty sure I know what will happen. My current design will start to form, but I will be unhappy with it. Suddenly it will dawn on what is wrong, then I will throw most of it away and embark on the new architecture. Yeah, that sounds about right. So the faster I get going down the wrong path, the sooner I can get to the right one.

I largely now have the main on disk structures all worked out, and I'm implementing the bucket bit map code, which is basically the outermost memory manager for disk space.

Oh, BTW, I'm writing the CouchDB layer in C++. I know, I'm a fucking hypocrite. But I have my reasons: It's mature and portable and the language is stable. I've thought of using Java, but I need something that's more lightweight and something that can be implemented as Crash Only. I've also considered Python and Ruby. One thing I don't like about Python is how often I'm surprised by undiscovered behaviors. In my mostly uninformed opinion, it's starting to turn into C++, with little special cases and obscure language features creeping in. And the APIs for it are very poorly specified. Ruby seems too immature as well. This is code that needs to be highly reliable, I need to know exactly what the code is going to do at all times. I want no surprises.

The more I code this stuff, the more excited I get. I'm having fun just coding all the details. I'm sure making design mistakes left and right and I can't even see them. And I don't care. Because later as more of the big picture is in place I'll be more able to see which parts I screwed up on and then I can fix them or throw them away.

Once I get the bucket management code done I'll talk about how it's implemented and I'll also talk about more of the storage design. Stay tuned.

Posted April 12, 2005 1:06 PM