January 31, 2008

CouchDB Catchup

The big trip

So last week I flew to San Jose for some meetings and to give 2 talks about CouchDB. I unfortunately missed my connection in Las Vegas and had to be booked on a later flight, which meant it wasn't until 2:30 am PT that I got to my room. Which for me, wasn't until 5:30 am. Then when I finally got to bed, I couldn't fall asleep. My sleep cycle had reset and I was wide awake for at least another hour and slept like crap. But when I woke up, I was so pumping with adrenaline it hardly mattered.

I started the day at IBM's Silicon Valley Labs (SVL), and though it's technically in Silicon Valley, it's out in the middle of what looks to be orchards and farms and undeveloped land. It rained nonstop the whole trip, but the scenery there was still beautiful. It must look incredible in the spring.

I met in-person for the first time David Fallside, my manager and a very interesting guy. I learned he'd been involved in a zillion different standards, like XML and SOAP. And despite my XML-hatin', he still digs CouchDB.

My talk was later at Almaden Research Center (ARC), which is set in the middle of a nature preserve in the hills that border San Jose. And I thought the SVL views were amazing, ARC is one of the most beautiful office facilities I've ever been too, inside and out. I'm told it was built in the 80's when IBM was crazy rich.

The talk was well attended, there was a handful of external guests and everyone else was from IBM. There were lots of DB2 developers and researchers there, and including PHDs, Distinguished Engineers and Fellows. Cool, except for... well, lets just say, my first ever CouchDB talk didn't go as well I would have liked. It makes for a funny story though. Ask me sometime.

 2008 01 Shipment

Then I had a meeting with the JAQL and JSON team that went much better (though I think I was losing coherency towards the end). JAQL is a new JSON query language they are developing and open sourcing, and is currently being integrated with HBase and Hadoop. The nice thing about JAQL is the ability to do straight forward "joins", which CouchDB doesn't have a simple answer for yet. They also were working on some really cool inverted indexes for semi-structured data that would allows for some really interesting dynamic queries. Of everything I saw there, the stuff they showed me got me most excited. This is a huge benefit to working with the people that I am at IBM, access to some really cool technology and brilliant people.

I'll be writing some more about this in the future.

The next day I had more meetings, and gave a talk to the DB2 XML guys. Despite being 30 minutes late (I was sent to the wrong IBM facility, doh!) and having to rush my still unpolished talk and dissin' XML, they seemed pretty receptive to CouchDB. And again, in talking with them I saw they had some deep insights into semi-structured data that was very applicable to CouchDB. These people get it.

One thing many IBMers were very interested in is Erlang, which I discovered I completely suck at explaining. But people seemed genuinely interested in learning about it. My advice is just start playing with it. That's how I finally "got it".

IBM Training

I just spent 2 days at the IBM Training for all new permanent hires. In Armonk. In January. So you know I was totally looking forward to it.

Fortunately the hotel and facilities were first rate, and we had a great instructor whose been with the company for 20 years and her enthusiasm was infectious. So the whole thing was better than I thought it would be, even if a little propagandistic. But one thing that has always impressed me is IBM really does have a long history as a progressive company and a commitment to diversity and integrity, and they make a big deal about that, that it's taken seriously and you can be fired for being a shithead. My previous experiences have shown they do take it seriously and I think its a big part of why it's still a thriving company after nearly 100 years. I think that's pretty cool.

Big Company Blues

Something in my HR records got seriously messed up when I joined, and somehow all our HR stuff, including live paychecks, was getting sent to our previous address from a year ago. In the IBM system, everything I had access to had the correct address, but when we'd call for support they'd have our old address. And we'd ask them to change it, but they said they didn't know how, that we'd have to change it on the web. The one that had the correct address already.

And it's not an isolated case, a woman at the IBM Training was experiencing the EXACT same thing. The address records she had access on the web were correct, but mail was getting sent to a previous address, and support reps couldn't help her. Another guy said his pay and everyone is his new division was wrong by hundreds of dollars.

Anyway my wife spent a ton of time on the phone about this issue and she just now got it fixed today, after 2 weeks of calling up various support people. We think it's fixed anyway. Yay big companies!

To the start-up in the Boston area who needs an Erlang Expert

I have been contacted by exactly 6 different headhunters for this job. Stop calling. I know it's the same job because they all say top salary, working from home, dual 30" Apple Displays and an Aeron chair, or variation thereof. It sounds sweet and I'm flattered, really. But you really can't pay me enough to stop working on CouchDB. Sorry.

BTW, if any Erlang expert wants to be referred for a pretty sweet sounding job, just let me know. Maybe I can get a big fat commission.

Scale Baby!

Here is someone who actually did some scalability testing on CouchDB. I'm not sure of what to make of the data, but he hit the limit of the testing tool at 20k concurrent requests. That's pretty cool.

On a related topic, Brian Aker wrote recently about the connection thread pooling work I implemented for MySQL 6.0. The improvements in client scalability look really impressive. I hope it scales like that on Solaris :)

Eating Dogfood

Christopher Lenz is now running his live blog off of CouchDB. Awesome. Please backup frequently Chris!

The CouchDB Components

Two drawings of the same thing.

Img-080128190327-0001

What Is Couchdb.020

I like the hand drawn version, but it actually takes more effort than using Apple Keynote.

Apache

Sam Ruby and David Fallside have made a ton of progress with the Apache work. Sam has created an Apache Incubator Proposal and we are presenting to the Apache Review Board next week. Things are moving ahead quite nicely.

Talk Slides

Here is a PDF of the slides for the talk I gave, with some minor changes and additions.

What Is Couchdb

The Amazon Elastic Compute Cloud

While it sounds like a bad 70's Disney movie, it's actually a fast and cheap way to create a big computing cluster using Amazon.com's resources. And someone has a created CouchDB Amazon Machine Image, calling it ElasticDB. Nice.

Link

January 21, 2008

San Jose trip, CouchDB talk - Updated: Scheduled for 2pm PT, Thursday, Jan. 24

I'll be in San Jose this Thursday and Friday to meet with various folks about CouchDB.

On Thursday afternoon at 2pm PT I'll be giving this talk at IBM's Almaden Research Center:

Title:

An Introduction to CouchDB


Speaker:

Damien Katz - Creator and Project Leader of CouchDB.


Abstract:

CouchDB is a JSON document database built for the web. I'll be talking about CouchDBs core concepts and features: RESTful HTTP api, ACID properties, Javascript based views, distributed update and replication model, and highly concurrent, fault tolerant code base built on Erlang OTP. I'll also discuss the document model and how it compares to SQL, as well as the current state of the project and future directions.

If you are interested in attending or otherwise meeting up, mail me at damikatz@us.ibm.com. See you there!

Link

January 18, 2008

Lasagna Cat

Lasagna Cat made me laugh so hard that I literally had muscles in my skull go into spasm from laughing.

They act out Garfield cartoon strips and add in a laugh track, then immediately follow it a video mashup. Most are less than 2 minutes total.

That last one inspired my wife to draw this:
Img-080118173039-0001

Link

January 15, 2008

IBM email glitch - Please resend

Because I was a former Iris employee, there were some problems creating my IBM email account that only got resolved today. So if you sent anything to my IBM email account before now, I probably didn't get it. Please resend. My new IBM email address is damikatz@us.ibm.com (yes, it is a hideous email address, thanks for noticing)

Link

January 6, 2008

The internet has traumatized my child

My 3 1/2 year old daughter can't read books yet, but she can use a computer and navigate the Nick Junior website. It's wild stuff man.

Recently she found some pretty disturbing videos on it. She's was crying for half an hour saying "It's a person. Click the camera. It's a person".

She had found some clips of the actors who do the voices on the show Go! Diego Go!. "Click The Camera" is a cartoon camera on the show and Rosie Perez does the voice.

Up until that moment, she thought the show was real. A cartoon show. With talking animals and cameras. I dunno either. But she's 3 1/2 so I'll cut her some slack.

Anyway, she was very upset. When I explained that actors do the voices and it's make believe, they just pretend so that it's fun, she just cried more. When I asked why she was crying, she said she didn't know.

Poor kid, she was just overwhelmed.

Eventually she calmed down and became curious and rewatched a video of the boy who does the voice of Diego. Then she wanted to go visit the voice actors. I explained that California is a long long way away, and she just started balling. I then said someday we'll go to California to meet them. But not now.

I regret making that promise, but it's not like I can keep it anyway. We can't just drive up to their houses and say "Howdy! Beautiful Day huh? Listen, my kid wanted to meet you, can we come in?" I think I have to keep making excusing and putting it off until my daughter is old enough to understand how awkward that would be.

Link

FAQ about CouchDB and it's new IBM overlords

Q. What does IBM's involvement mean for CouchDB and the community?
A. The main consequences of IBM's involvement are:
- The code is now being apache licensed, instead of GPL.
- Damien is going to be contributing much more time!

Q. What about all the people who worked on the project? Are they still going to contribute, or has this become an IBM only club?
A. No one is being being replaced or pushed aside, and I'm hoping the people who are currently contributing keep contributing, because they are passionate and their work is fantastic.

Q. IBM sucks! They are going to ruin everything because they're a bunch of ruiners! Blah Blah Blah!
A. Ok, that wasn't really an question, but I'll respond anyway.

Yes IBM does suck. But they also rule. And they are just "Ok".

They say you can't be all things to all people, but when you have 350,000 employees and a presence in every nearly every niche of the computer industry, you can come close. IBM is gigantic and diverse.

So say what you want about IBM, but CouchDB will be a part that does not suck.

When it comes to CouchDB and control, I've made sure not to get myself into a situation where I could lose my rights to keep working on CouchDB. Praise the power of open source!

Now don't get me wrong, I will tend to see things IBMs way a little more often, seeing things through big blue tinted glasses. It's just unrealistic to think any other way about it.

But they don't own me, and they sure don't own CouchDB. I've made sure of it.

If there is bad quarter and a corporate reshuffle and suddenly I'm working with vapid bureaucrats, then we'll have to part ways. Same as any other job. Except CouchDB has been kept free and open and I, like anyone else, can continue the work on my own, forking the code base if I wish.

It's the beauty and freedom of open source, and IBM has shown time and time again they are visionary enough to be a part of it and share in the rewards.

Q. So is CouchDB now going to written in Java?
A. Erlang is a great fit for CouchDB and I have absolutely no plans to move the project off it's Erlang base. IBM/Apache's only concerns are we remove license incompatible 3rd party source code bundled with the project, a fundamental requirement for any Apache project. So some things may have to replaced in the source code (possibly Mozilla Spidermonkey), but the core Erlang code stays.

An important goal is to keep interfaces in CouchDB simple enough that creating compatible implementations on other platforms is feasible. CouchDB has already inspired the database projects RDDB and Basura. Like SQL databases, I think CouchDB needs competition and a ecosystem to be viable long term. So Java or C++ versions might be created and I would be delighted to see them, but it likely won't be me who does it.

Q. What is CouchDB's relationship to Lotus Notes/Domino?
A. There is no relationship at all. I've not talked about CouchDB with anyone in the Lotus group. CouchDB is in IBM's Information Management group. As far as I know, there are currently no plans for CouchDB to be in any way integrated with anything from Lotus.

But that may change at anytime.

Link

January 1, 2008

New Gig

Great news!

I've accepted a permanent, full-time job with IBM. My primary duties are (drumroll....) CouchDB! So all the stuff I've been doing up until now for free I'll be doing full time and be getting paid for it! Yee Haw!

Logo

 T Ibm Logo

 Var Ezwebin Site Storage Images Media Images Apache Logo 5526-1-Eng-Gb Apache Logo Medium

All the code will be Apache licensed and donated to the Apache Software Foundation, with the plan CouchDB will eventually become an official Apache project. A big plus here is the Apache license allows anyone to do pretty much anything with the code, so everything remains truly open source. I wouldn't have done this without IBM's commitment to keeping CouchDB open.

Huge thanks to Anant Jhingran, David Fallside, and Sam Ruby at IBM for helping to make this happen. And of course huge thanks to all the CouchDB project members and contributors who've put in their time and effort to push the project forward. This is a important validation of the project and everyone's effort.

Link