A Comfortable Couch

Tuesday, February 15, 2005

Why a distributed object store?

So why is the new Couch now going to be distributed object database? Like we need another one of those, right? There are tons of failed OODBMS systems that never took off or are slowly dying in the market place, and only handful of mild successes. Well unlike most "object" databases, Couch is not meant to be a transparent way to get persistence into OO languages. Those systems rarely meet success because they restrict themselves in an unfortunate way: the semantics of the database are bound to the semantics of the programming language. But I digress.

No, Couch is going to be an object store very much in the Notes NSF store sense. So in that sense it's really a document store, but also like NSF it's more than a document store: It's an unstructured object store. That means it's suitable for holding things that don't conform to a predefine schema. What sorts of things fit this category? Well, almost everything on the internet meets that requirement. Web pages, emails, RSS blog posts, Word documents, files, etc.

So initially I'm going to focus on making Couch a suitable database for a RSS/Atom aggregator. That's right, ultimately I'm building another feed aggregator, ala Bloglines. So I'll be building Couch with the needs of a RSS Aggregator in mind. Couch will not be built to make it easy to add persistence into a programming language, instead it will be built to be the large scale database for a RSS aggregator. And that problem domain will include:
  • Ability to query data in a manner that's relevant to users. If you had a database with all the blog posts ever in it, how would you want to see the data?
  • The ability to provide each user with a customized experience.
  • Tracking unread marks
  • Providing methods for users to organize of flag articles of interest.
  • Finding other articles and blogs of interest to the user.
  • Lot's more crap. Way more than I can think of right now.
So, it will be interesting to build a database system that is the back end to such a system. The cool thing about this project is it's not like financial transactions or something like that where all the ACID properties must apply. Instead I can focus on the problems of a large database where moment to moment consistency isn't necessary, and hopefully I can be more innovative in the storage, distribution and transactions models because of that.

I really like this project better than the old Couch. I have a good knowledge of the inner working of a similar store (the NSF store in Notes and Domino). And an NSF style store very unusual, not many databases share the model, which is a good thing IMO. I think I can push the storage and application development model in new directions. While everyone else automatically reaches for their relational stores, maybe this store will enable features and flexibility that isn't possible otherwise.

BTW, a belated thank to Greg Menounos for lending me his NT Filesystem book. I'll be sending it back soon.


Gaston said...

Will you use any of the XML database existing in the open source market as a start ?
Bloglines is using Sleepycat, what do you think of it ?

Good luck with your projet,

5:36 PMlink  

Post a Comment

<< Home