February 23, 2008

What will we be made of?

We'll be made out of the rays of the Sun.
And the clouds will be made of mushrooms.
And the shadows of trees will be chocolate.

Link

February 20, 2008

I'll Be Speaking At ETech

I'm happy to announce that Jan Lehnardt and I will be giving the CouchDB from 10,000 ft talk at ETech 2008 (March 3-6).

ETech I think will be a blast, I'll get to meet lots of really brilliant people and they actually have interesting topics. And of course it will be great to get feedback on CouchDB.

Want to go? Here's a special CouchDB 35%-off discount code, just for my readers: et08rdr

There, I give you CouchDB AND a valuable coupon, don't say I never did nothing for ya.

Link

February 18, 2008

Recursion unsafe in C/C++?

The question of "why disallow recursion in C++?" came up in a comment to the C++ coding standards for the Joint Strike Fighter. I'm reposting a version of the comment I left there, because I wish someone had explained this to me a long time ago.

Recursion isn't safe in C/C++ because there is no reasonable way to deal with running out of call stack space. Even if you dynamically allocate stack frames from the heap, you still can run out of heap and how do you handle the error then?

You might convert the failed call stack allocation into an exception that unwinds the stack until an out-of-stack exception handler is found, but the problem is that any function in any library called from a recursive routine might be the call that blows the stack.

To use exception handling you must now implement catch-like error handling to deal with any failed function calls, and the error handling cannot require any additional stack. And while unwinding the exception, stack allocated object destructors also may not use any additional stack space. So long RAII. Exception handling isn't a workable solution for stack failures.

So for code that needs to be proven it cannot blow the stack, the simple solution is to disallow recursion altogether. With recursion disallowed, it it becomes a tractable problem to statically analyze the code and determine the absolute maximum amount of stack space necessary. Preallocate that amount of stack and you don't need to worry about error handling for stack failures, they aren't possible.

Fortunately any algorithm that can be written recursively can be rewritten iteratively by keeping a heap-based stack object (and if it can be written completely tail-recursively, you don't even need a stack). The code might be uglier when written iteratively, but it's always possible and you can do error handling for failed stack allocations.

Link

February 14, 2008

Jan's CouchDB Presentation Marathon

Jan Lehnardt will be giving 3 CouchDB talks next month, and then spending a few weeks in Charlotte hacking CouchDB with me.

I think I will also be speaking at O’Reilly ETech next month, but I haven't gotten formal confirmation yet.

Link

February 13, 2008

CouchDB Accepted to Apache

I’m pleased to have been the one to announce that CouchDB has been accepted for incubation by the Apache Software Foundation.
Sam Ruby: CouchDB @ ASF

Woot!

Link

February 12, 2008

The Blue Islanders Puzzle - Real answer?

So I recently read about the Blue Islanders Puzzle:

There is an island upon which a tribe resides. The tribe consists of 1000 people, with various eye colours. Yet, their religion forbids them to know their own eye color, or even to discuss the topic; thus, each resident can (and does) see the eye colors of all other residents, but has no way of discovering his or her own (there are no reflective surfaces). If a tribesperson does discover his or her own eye color, then their religion compels them to commit ritual suicide at noon the following day in the village square for all to witness. All the tribespeople are highly logical and devout, and they all know that each other is also highly logical and devout (and they all know that they all know that each other is highly logical and devout, and so forth).

Of the 1000 islanders, it turns out that 100 of them have blue eyes and 900 of them have brown eyes, although the islanders are not initially aware of these statistics (each of them can of course only see 999 of the 1000 tribespeople).

One day, a blue-eyed foreigner visits to the island and wins the complete trust of the tribe.

One evening, he addresses the entire tribe to thank them for their hospitality.

However, not knowing the customs, the foreigner makes the mistake of mentioning eye color in his address, remarking “how unusual it is to see another blue-eyed person like myself in this region of the world”.

What effect, if anything, does this faux pas have on the tribe?

This is a very interesting puzzle, and if you read the comments on the linked page, you'll see all sorts of fascinating arguments about what will happen. I pondered this puzzle on and off for a few days, it's a real noodle scratcher. I like how you must reason about people, who will also reason about other people. I see clear relationships to distributed computing and emergent behaviors, things I find really interesting.

But after a while something about this question really bugged me, and I think I finally figured out why. It's an flawed question.

The problem is it assumes the existence of someone or something that exhibits perfect logic. Though in this question, it phrases it as "highly logical", while different from "perfectly logical" the implication, I believe, is the same.

However, if the implication is not the same as perfectly logical, the question is meaningless until we quantifiably define what "highly" means in the phrase "highly logical". So as stated, the question is incomplete, we can't apply logic to how the islanders would act if all we know is they will be applying an indeterminate amount of logic to decide how to act.

But if the implication is that "highly logical" means "perfectly logical", then that too is nonsensical. Why? Because there is no such thing as perfect system of logic. Then we try to apply our various systems of logic and methods of proof to reason about something that is proved to not be possible, of course we come up with all sorts of conflicting answers.

I take the position the question has no correct answer, it's a flawed question.

Link

February 10, 2008

CouchDB and Apache

The Incubator Proposal for CouchDB has been submitted and is now being voted on.

Link

February 8, 2008

Incremental Map/Reduce/Combiner?

Because people keep asking me about a Combiner function (which is written about in the Google Map/Reduce paper and also used in Hadoop), here's a follow up to yesterdays post. The reason I'm not talking about about a combiner function is because I'm pretty sure the design I propose is isomorphic to a Map/Reduce/Combiner combo design. That is to say it's the same damn thing, just re-jiggered a little.

To illustrate, here is an example that does it's own combiner operations inside the reduce.

Let's say we have a database full of receipts and we want to calculate the total spent in a month and the average amount of each purchase.

Here is an example receipt:

{
"_id":"2DA92AAA628CB4156134F36927CF4876",
"type":"receipt"
"amount":19.95,
"note":"Enzyte, i sure hope this works",
"year":2008,
"month":1,
"day":12,
....
"}
}

Calculating the total spent in a reduction is easy, it's simple addition which is commutative and associative. But calculating an average is not intrinsically associative, so we'll have to make our function a little more complex to make the Javascript function behave properly.

This map function outputs each receipt's year and month as the key, and the amount as the value:


function (doc) {

if (doc.type == "receipt") {

var key = [doc.year, doc.month];
emit(key, doc.amount);
}
}

And a Reduce function to compute the totals of the receipts:


function (date, values) {
var result = {totalSpent:0, numPurchases:0};
for(var i=0; i<values.length; i++)
 {
switch (typeof values[i]) {
case "number":
result.totalSpent += values[i];
result.numPurchases++;
case "object":
// this in an output object from a previous reduction
result.totalSpent += values[i].totalSpent;
result.numPurchases += values[i].numPurchases;
}
}
return result;
}

Once saved into a design document then you can get the values from January of '08:

GET /db/_view/receipts/amount_spent_month?key=[2008,1]

Result:

{"total_rows":12,
"rows":[
{ "key":[2008,1],
"value": { "totalSpent": 1252.1
"numPurchases": 13}}
]}

Now calculating the final average is left to the client.

However, we can avoid extra combiner logic by simply changing the map function to output the values to look like their final form anyway:


function (doc) {

if (doc.type == "receipt") {

var key = [doc.year, doc.month];
emit(key, {totalSpent: doc.amount, numPurchases:1});
}
}

And the Reduce function:


function (date, values) {
var result = {totalSpent:0, numPurchases:0};
for(var i=0; i<values.length; i++)
 {
result.totalSpent += values[i].totalSpent;
result.numPurchases += values[i].numPurchases;
}
}
return result;
}

This code is a lot simpler than the first example, but doing it this way has the disadvantage that the map output values, which can be very numerous, are now containing extra useless information for the sake of algorithmic simplicity, possibly slowing things down or using extra resources.

Anyway, I think the addition of a separate Combiner function makes things more complicated for the general case, and an separate combiner design can be done in this design anyway, using wrapper functions. So I don't see the point of an explicit combiner function, though I might be convinced otherwise.

Link

February 7, 2008

Incremental Map/Reduce

Ok, here is a quick rundown of how the full Map/Reduce functionality in CouchDB will work.

So right now, to create views in CouchDB, you create a function that takes a document and outputs key value pairs.

This is a function that defines a view of documents keyed by their tags:

function (doc) {

for(var i=0; i<doc.tags.length; i++)

map(doc.tags[i], null)
}

To be used, this function must be stored in a design document:

{
"_id":"_design/tags",
"views": {"all": "function (doc) {...snipped...}"}
}

Then we can get the results:

GET /db/_view/tags/all

Result:

{"total_rows":6,
"rows":[
{ "id":"2DA92AAA628CB4156134F36927CF4876",
"key":"bar",
"value":null},
{ "id":"4156134F36927CF48762DA92AAA628CB",
"key":"bar",
"value":null},
{ "id":"4F36927CF48762DA92AAA628CB415613",
"key":"bar”,
"value":null}

{ "id":"2DA92AAA628CB4156134F36927CF4876",
"key":"foo",
"value":null},
{ "id":"4156134F36927CF48762DA92AAA628CB",
"key":"foo",
"value":null}
]}

Then get just the docs tags foo:

GET /db/_view/tags/all?key="foo"

Result:


{"total_rows":6,
"rows":[
{ "id":"2DA92AAA628CB4156134F36927CF4876",
"key":"foo",
"value":null},
{ "id":"4156134F36927CF48762DA92AAA628CB",
"key":"foo",
"value":null}
]}

The restriction on map functions is that they must be referentially transparent. That is, given the same input document, they will always emit the same key/value pairs. This allows CouchDB views to be updated incrementally, only reindexing the documents that have changed since the last index update.

Okay cool, simple and efficient and it's already working. (The only thing I want to change about it is rename the internal javascript map(Key,Value) call to emit(Key,Value). It's less confusing that way.)

Now what about Reduce?

This is a similar example to the previous view map function, but this time we want a view of all the tag counts in the database. (The user programable reduce isn't done yet, but this is how it will work.)

We need a Map function to output doc tags:


function (doc) {

for(var i=0; i<doc.tags.length; i++)

emit(doc.tags[i], 1);

}

And a Reduce function to count the tags:


function (tag, counts) {
int sum=0;

for(var i=0; i<counts.length; i++)

sum+=counts[i];
return sum;

}

And then stored in a design document:


{
"_id":"_design/tags",
"views": {
"all": "function (doc) {...snip...}",
"count": {
"map": "function (doc){...snip...}"
"reduce": "function (tag, counts){...snip...}"
}
}
}

Now to access the results:

GET /db/_view/tags/count

Result:

{"total_rows":2,
"rows":[
{ "key":"bar",
"value":3},
{ "key":"foo",
"value":2}
]}

Then get the count of just the docs tagged "foo":

GET /db/_view/tags/count?key="foo"

Result:

{"total_rows":2,
"rows":[
{ "key":"foo",
"value":2}
]}

To make incremental Map/Reduce possible, the Reduce function has the requirement that not only must it be referentially transparent, but it must also be commutative and associative for the array value input, to be able reduce on it's own output and get the same answer, like this:

f(Key, Values) == f(Key, [ f(Key, Values) ] )

This requirement of reduce functions allows CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost. It also allows the indexes to be spread across machines and reduced at query time with logarithmic cost.

The incremental design makes it possible to use map/reduce to query huge partitioned clusters in realtime, instead of having to wait for a whole map/reduce job to complete or having stale, occasionally updated indexes,. The downside is it may be harder to write the Reduce function in an associative and commutative manner.

Update: Here is a followup posting about Combiner functions.

Link

February 3, 2008

ErlyJS - Javascript on the Erlang VM

I got an email from Roberto Saccon:

I am working on a pure Erlang implementation of a Javascript compiler, targeting the Erlang VM.

http://erlyjs.googlecode.com

[...] It is pre-pre-alpha now, but later on, might that be interesting for CouchDB, what do you think ?

"Hell yes", that's what I think.

Link

February 1, 2008

Faster CouchDB Document Commits

Right now CouchDB maintains a database "header", a 4k region at the front of the file that contains the root of the valid database structures. When the header changes, it is the only thing that finally commits the changes permanently to disk, until that happens any changes to the database will simply be invisible on restart. The header is written twice in 2 identical 2k regions, and each is signed. Upon restart, if the first header is bad, because of power failure during write for example, the second header will still be intact and valid. If the second is also bad, then the database simply cannot be opened.

This is a simple and very reliable design. The problem with this arrangement is that to commit a document, the disk head has to seek at least once or more for any document commit (depending of the underlying filesystem, some journaling filesystems could do it with 0 seeks). Disk head seeks are slow, but once the head is in position a read or write is fast. CouchDB with it's write only storage generally clusters the updates very close to the end of the file, so less seeking to commit the document and associated indexes. But then it always has to seek all the way to beginning of the file write out the header for the final commit.

So unless you batch the document saves, each document save is going to involved multiple seeks, limiting absolute database throughput significantly. (That just made me realize that the storage engine could also batch document commits, committing on a small fixed interval, speeding up concurrent throughout at the cost of an additional fixed latency)

Anyway, if instead of writing the header to the front of the file, how about writing it to the end of the file? That solves the seek issue necessary to commit the header. The only problem now is how do I find the header on the next time we open the database?

When the shutdown is "clean", its easy, the header is always the very last thing in the file. But what if we wrote a bunch of stuff past the last valid header, and then crashed before we could finish?

That's the thing I'd been kicking around in my head for months, I think I finally have a design I like.

I'm thinking that every, lets say 4k, in the file, there is space reserved for an offset pointer that points to the last known good header in the database file. And the database header it points to will have a md5 signature to validate integrity.

Committing to disk is still a 2 flush process. Flush 1 is to flush the data, meta-data, associated indexes, etc to disk. Then flush 2 is to append the database header, the structure that points to the all the correct/updated structures. But since you are writing data adjacent to the last flushed data, the head seeks are greatly reduced.

But what happens upon opening the database? First thing is to look at the end of the database file for the database header. If there is no valid md5 signed header in the last 4k, then we start doing a divide and conquer search, checking the header offset pointers (which are every 4k) to find the most recent valid pointer.

We check the middle most offset pointer and if it points to a valid header, then we pick a offset pointer halfway between it and end of the file and repeat. If instead it points to a invalid header, then we pick an offset pointer between the current location and the beginning of the file. Recurse the algorithm, tracking upper and lower bounds until you've found the most recent database header.

This algorithm will have O log(N) cost, where n is the number of 4k segments in the file.

Then once you've found the most recent valid header, truncate the file past header and then the database is back to normal, ready to be read and written. Cool.

Anyway, I'm not sure when I'll implement it. For one thing it's possible solid state memory with constant random access times become common, so it won't matter so much then. The current design will be fine and it currently has a constant worst case startup cost. This new design has a worst case startup cost of O log(n).

Also, while I think this new design is reliable, I'm not as sure about it as I am the current design. If anyone sees holes in it, failure cases I'm missing, let me know.

Link