September 13, 2007

System overload

Erlang is highly concurrent.

Damien is not.

CouchDb has been getting tons of interest. I can barely keep up with it. I wish I weren't so busy so I could actually respond to some of the stuff people are writing.

Between my newborn daughter, my full time job at MySQL, finishing the CouchDb JSON work and the week long MySQL engineering meeting in Germany next week, I'm stretched about as far I can go.

Anyway, I'm going to do a brain dump here:

DB or not DB? That is the question. ;/p>

First, the most important thing of all. Everyone keeps referring to CouchDb with an uppercase B. So maybe it should CouchDB. Why did I choose a lowercase b to begin with? Because I.... uh, um....I don't really remember. I think it was because it fit the camel-case convention we used at Kubi, and I tend to use the same conventions as whatever I was coding last. Anyway, for the rest of this posting I'm going to try the uppercase way.

Also, anyone want to create a CouchDB logo? I'm thinking someone RESTing comfortably on a couch, but since you'll be doing it for free, you can make it be anything you want.

Jan announces "The Couch Book". He's already got a good chunk written. Can I write the foreward?

Sam has been spending a lot of time with CouchDB, and has provided a lot of excellent feedback and suggestions. He gets the simplicity and access angles.

Sam Ruby - Ascetic Database Architectures:

What API’s/drivers to I need to integrate it with Ruby?

HTTP and JSON.

What API’s/drivers would I need to integrate it with Java?

HTTP and JSON.

What API’s/drivers would I need to integrate it with AJAX?

HTTP and JSON.

What API’s/drivers would I need to integrate it with...

oh, you get my point.

Do HTTP and JSON work with existing J2EE servers? You betcha.

What tooling do I need? Um, a browser, perhaps?

Dare Obasanjo has a much more measured response, Some Thoughts on CouchDB and Relational Databases:

Document oriented database work well for semi-structured data where each item is mostly independent and is often processed or retrieved in isolation. This describes a large category of Web applications which are primarily about documents which may link to each other but aren’t processed or requested often based on those links (e.g. blog posts, email inboxes, RSS feeds, etc). However there are also lots of Web applications that are about managing heavily structured, highly interrelated data (e.g. sites that heavily utilize tagging or social networking) where the document-centric model doesn’t quite fit.

I mostly agree with that but I then picked out one of the most agreeable paragraphs in his post. You need to read it all to get his full take.

I disagree there is anything inherently relational about tagging or social networks (excepts of course, for the actual human relationships). I have a feeling the document model with views and map/reduce is actually a better fit than for these sort of applications. Think about how Google is figuring out what is interesting and interrelated in our world wide web of documents. They're not using SQL my friends.

SQL is great when you have highly structured data. The problem is much of the data we generate day to day isn't easily extractable into carefully planned schemas and are challenging to represent and query in a SQL databases. That means lots of useful data that could be stored and queried ends up unused or lost because we don't have the time and resources to build schemas to store them.

A big goal of CouchDB is to free you from worry about carefully pre-structuring and normalizing your data, and instead just let you start storing your data in a natural and self describing way and evolve your queries over time. The idea is there is a lot of data that could be accessible, indexable and shareable with very little work.

But there are lots of other database problems with very different data and consistency needs that won't fit well into the CouchDB model. When data is highly structured and relational in nature, or when you need complex transaction processing, SQL is going to take CouchDB out to the woodshed and beat it with its own GPL license. The right tools for the right job and all that.

Also Dare seems to imply that CouchDB is trendy because JSON is the current format de jour. Possibly, but that's not why I chose it. I'll tell you why I finally gave up on XML and chose JSON:

XML FUCKING SUCKS

And yes, the swear words are necessary, because you need to understand how much fucking frustration it has caused.

I'll tell you the one thing XML is good for (and I could be wrong because I really don't know many alternatives), it's good for marking-up textual documents. For anything else, ESPECIALLY PROGRAMMATIC INTERFACES, it's a goddamn nightmare. I finally saw the light. JSON also has warts, but it has been an absolute dream in comparison.

Sam Ruby also responds Dare Takes a Look at CouchDB, I highly recommend you read it:

At a certain point, referential integrity has to be given up. Scale a bit further, and even the notion of a relation in the relational database sense of the word starts to break down. To cope, you denormalize a bit, not so much for performance reasons (though that’s important too), but as a self defense mechanism so that the pieces of data that you do have have enough context to be meaningful.

And Assaf Arkin responds as well Conflicting Reads and Writes

Relational databases have failed the software industry in much the same way XML, Java and client-server failed the software industry. In other words, no failure to see here, move along. Those are all excellent technologies for solving a wide range of problems. Just that there are some problems they’re particularly poor at solving.

Exactly! We need a wide range of tools, my screwdriver doesn't negate the value of your hammer.

Lastly, there is no large file support on Erlang, the current Erlang file drivers are limited to 32 bit files sizes. Obviously that is a bad thing for a database engine. CouchDB needs a custom file system driver with large file support. I think a good approach is to copy the existing file driver code and go from there. This is the approach I was going to take and I figured it would take me less than a week. Anyway, if anyway knows of a large file driver, please let me know. If anyone wants to write a large file driver, please do and then give it to me. Then the rest of Erlang land can benefit too.

Link

September 3, 2007

CouchDb strikes a chord

Apparently the switch to JSON and Javascript is a big hit. Jan writes a bit about the upcoming changes and suddenly CouchDb is attracting lots of attention.

Andrew Tetlaw - Watch out for CouchDb

I also dig the free-form data aspect. In a rdbms it’s often the case you have to be attentive to your schema design so that you can support future query requirements. In CouchDB you just make views whenever you feel like to give you any number of different views on the same data.

Tobias Lütke - Futuretalk: CouchDB

CouchDB uses the concept of views which are essentially javascript methods. It uses map/reduce to find matching records in its global namespace so that at query time the results are available instantaneously. This is a huge performance boost for web applications which generally have many more queries than update/inserts.

Labnotes - CouchDB: Thinking beyond the RDBMS

Here’s the kicker. This simple architecture you can partition and replicate any way you want, map/reduce these computed tables on any scale, and deal with the rest on the client.

Also there are a bunch of comments on Reddit. Here is one of my favorites:

By the by, the flurry of CouchDB articles smell remarkably like freshly mowed astroturf. Color me suspicious.

Me too. No way this many people get it.

If you want to know more about the design and implementation of CouchDb, read the Technical Overview. Also, Jan has been doing a ton of work updating the documentation wiki to reflect the new changes.

Keep in mind that although you can get the source and build it yourself, the CouchDb JSON conversion isn't done yet. The client replication logic hasn't been converted to use the new JSON formats and file attachments aren't tested. Once that's done we'll release the next alpha version.

In the meantime, here is the nascent Javascript based test suite and client library. The cool thing about them is they are run directly from a web browser, making a pimped-out test suite with GUI bling and debugging hotness not only possible, but mandatory.

Link