Blogging Couch

Ok, so I've been very remiss in that I haven't blogged anything Couch related. There is a good reason for that, I haven't been working on it very much. Why? I just wasn't in the right mindset for it, which is a wuss way of saying I didn't feel like it.

I've been doing some contract work, but mostly I've worked on lots of little personal projects. Some, like Clicky and Idiot-B-Gone, were only a day or two. Others, like my random code generator, I just I lost interest. And of course I spent a lot of time on Trigger Happy.

In case you didn't know, Couch is going to be the storage system for a large scale object database. My long term goal is a database that can serve as the back end to a very large scale RSS aggregator.

The lowest layer of the project is CouchDB. CouchDB will be the code layer that actually shuffles the bits around on disk. It will be responsible for object storage, retrieval and indexing. Think Notes NSF and NIF.

So for the past couple of weeks I've been doing more and more work on CouchDB. When I was trying to figure out how I was going to architect all the on disk structures I reached a big breakthrough....I know nothing about building a database. And I need to accept that, and just move forward anyway. Even though I've worked a lot on NSF, I think that is just confusing me. Because NSF has all this complexity that sort of grew organically. I don't mean that as a bad thing at all. It's a big part of its success. But it's also not how you'd design a database system if you're building from scratch. To put in that complexity ahead of time is asking for disaster.

So I'm reminding myself that writing code is like molding clay, not carving a sculpture. Code is plastic, it can change oh so very easily. So I need to just write the damn code! When I rewrote Compute, that's exactly what I did. I was so precocious.

So I need to push. And I'm pretty sure I know what will happen. My current design will start to form, but I will be unhappy with it. Suddenly it will dawn on what is wrong, then I will throw most of it away and embark on the new architecture. Yeah, that sounds about right. So the faster I get going down the wrong path, the sooner I can get to the right one.

I largely now have the main on disk structures all worked out, and I'm implementing the bucket bit map code, which is basically the outermost memory manager for disk space.

Oh, BTW, I'm writing the CouchDB layer in C++. I know, I'm a fucking hypocrite. But I have my reasons: It's mature and portable and the language is stable. I've thought of using Java, but I need something that's more lightweight and something that can be implemented as Crash Only. I've also considered Python and Ruby. One thing I don't like about Python is how often I'm surprised by undiscovered behaviors. In my mostly uninformed opinion, it's starting to turn into C++, with little special cases and obscure language features creeping in. And the APIs for it are very poorly specified. Ruby seems too immature as well. This is code that needs to be highly reliable, I need to know exactly what the code is going to do at all times. I want no surprises.

The more I code this stuff, the more excited I get. I'm having fun just coding all the details. I'm sure making design mistakes left and right and I can't even see them. And I don't care. Because later as more of the big picture is in place I'll be more able to see which parts I screwed up on and then I can fix them or throw them away.

Once I get the bucket management code done I'll talk about how it's implemented and I'll also talk about more of the storage design. Stay tuned.

Posted April 12, 2005 1:06 PM

Comments

two things ...


1. sounds like you are prototyping in the way that you are likely to throw a large chunk of the code out as the architecture unfolds / reveals itself. How about using a language that will let you get to a point of throwing staff away faster. C++ is good if you are a control freak (no I am just stirring) but it may not be the fastest way to get the architecture to reveal itself. The assumption I have made here is that the details which are implementation and technology specific do not have a significant impact on the architecture. I think its a fair assumptions but I may be wrong. As for the choice of language which will be faster to develop in I would have thougt that the ones you mentioned (Python, Ruby) would do just that. And there is always Lisp.


2. Since its an aggregator that you are building is it going to aggregate just text or graphics, movies (multimedia) also? Will it allow for interweaving of channels or if that is against some intellectual property law multiplexing them so that I can create a collage of the things that interest me. I guess that may be done by tagging content and then constructing streams based on the tags. And as an aggregator will it preprocess the content also, as in resize overly big images if I want to connect from a mobile phone or filter out cerain content which my device is not capabe of showing all together. Or convert text to speech?


I see that an aggregator is a foundation of sorts and these additional services could be provided as plugins. Or if you think of streams of content they could be filters and processors that could be applied independent of the aggregation.


Slawek

Slawek, April 13, 2005 9:09 AM

1. I probably overstated how much the current code is a prototype. It is, in that I feel all code is a prototype of some ideal that you never reach. I'm not writing the code with the intention of throwing it away, only that I know some significant part is likely to get tossed.

That being said, your idea of throw away coding in a more productive language is compelling. If I knew Python or Ruby better, I might have gone that path.

And yes, when it comes to this type of stuff, I am a control freak. I need this code to be extremely reliable. I need to know that every API and language feature I use is well behaved and reliable, because when I'm faced with a bug I need to know it's a problem in my code. In my experience, 99.99% of application crashes are coding flaws. I'd like to be able to narrow it down to my code.

2. It will be a store for all kind of rich content. The plan is to be able to index all that content any way you like. And yes, a big part of what is motivating this is the sorry state of RSS aggregation. Bloglines aggregator works quite well, but there is so much more it could do.

Your thought's on stream and filters also jives with a lot of my thinking. But at this point I'm not thinking too much about every possible feature that an aggregator will need. Instead I'm trying to build an alternative way of looking at the data, far removed from the relational world. My hope is that I'll end up with flexible platform that betters meets the needs of unstructured data.

Damien, April 13, 2005 12:28 PM

Post a comment




Remember Me?

(you may use HTML tags for style)