July 29, 2010

Getting Your Open Source Project to 1.0

The project I founded, Apache CouchDB, recently hit 1.0. I'm very proud :)

saucelabscupcakes.jpg
Awesome 1.0 cupcakes from Sauce Labs.

It's been a long time, but we finally produced a release that's complete, performs well and is rock solid.

Already CouchDB is on over 10 million machines. It's used by big respected websites (like the BBC) and groundbreaking organizations (Mozilla and Canonical). We run on most *nix, OS X, Windows, and even Android phones. Have dozens of frameworks and client libraries available. There are 2 books available for sale right now. There is a venture capital backed startup, Cloudant, that offers CouchDB hosting and scales to huge datasets. And I'm CEO of another venture backed ($2 million invested) 12 person start-up, Couchio.

So how did I get here? It took a lot of time and effort (almost 5 years!), and the help of a lot of people. Here are some tips of what it took to get CouchDB to 1.0.

Why?

Successful open source projects need a reason for being. You need to decide why you are creating a project and what problems it solves. Whether it's one or many reasons, you need to figure out what they are and explain them.

Perhaps you are making something new, that hasn't existed. Why hasn't it existed it before? No had the idea? No one had the will to carry it through? Or maybe you are making something that's already in existence, like an HTTP server. What are your reasons? Simpler, faster, more features, different license?

If you are just doing it as a learning exercise, that's fine. But don't expect to attract a community until you can explain why it's useful beyond you own personal goals.

With CouchDB, my reasons were:
1. A schemaless document database with views, bi-directional replication and conflict detection to enable disconnected operation would be really useful.
2. I wanted to understand more about creating distributed systems and database internals.

No one cares about reason #2 except for me. But the first reason is compelling.

Make sure you can tell people why your project exists and what it's good for. And put the reasons on your project site where people can find them.

Code Comes First

Don't start a project unless you have a deep commitment to being a strong coder.

Now I'm not saying you must be a strong coder to participate in a project. Not at all. I'm saying that you must be strong coder to lead one. Maybe you'll get lucky and somehow attract a some really good coders to your project. But most really good coders go to projects with already solid codebases, or start their own.

Also, you don't have to be a strong coder when you start out, but you should know the basics and have a strong desire to learn and get better. Don't expect to attract anyone to your project until you have a substantial amount of working code that isn't a big ball of spaghetti.

With CouchDB, I always emphasized the quality of the high-level design and code implementation. We cannot under any circumstances lose or corrupt your committed data, or get things into an inconsistent state. Reliability and durability are absolutely imperative. Any design or implementation that doesn't meet these goals doesn't make it into the project.

Some projects might not have an emphasis on the reliability, but on absolute performance. That's a fine choice to make, but make sure your users know what they are trading off. And then actually deliver on the performance.

As the project moves along, you will need to ensure the code quality (reliability, performance, resource, usage, etc) is improving over time. If you aren't a good coder, you won't be able to do this.

Know What You Aren't

Almost as important as knowing what your project is trying to accomplish, is know what it isn't trying to accomplish.

When your project starts to get traction, but before it's done, you'll get a lot of people who want the project to work more like things they've used in the past. New users might think your goals and abilities are cool, but they'd trade it all for just a little more. They'll want everything your project does, plus a pony.

The problem is feature and scope creep. Even if you are successfully keeping the project on track, the community may get slowed down dealing with people trying to make it something it's not. Stating clearly what your project isn't trying to do or be helps make it much easier to explain what you can't implement or change.

Now, you can't define everything your project isn't. (It's not a video game. It's not accounting software. It's not a banana. It's not a rainbow. etc.). But you can find the things it's related to, overlaps with, or might be confused with, and explicitly say it's not those things.

With CouchDB, because we are a database, people often asked us to add features that were in traditional RDBMS's, but didn't fit well with the CouchDB data model. Not being intimately familiar with CouchDB's model and how it all fits together, they don't realize that what they're asking for simply doesn't work. But because we explicitly stated on the project site we aren't a relational database and aren't trying to replace relational databases, it made it much easier to explain why those features weren't a good fit for what CouchDB is trying to accomplish.

So if you don't clearly define what your project isn't, often people will try to make it into those things. This can damage the community, as moving forward is slower and people feel like they aren't being listened to. Be explicit what you aren't, and it makes it much easier to focus on what you actually are.

Don't Do Everything (Well)

So you are superstar coder, your code is clear and concise and high quality, you write clear complete documentation, your create all the tests, and you fix every bug. You are awesome!

Thing is, you might be awesome, but until you actually get a community behind the project, the project will be limited in an absolute sense by what a single person can produce. And if you are doing everything, that's not a whole lot. Trying to do everything well means you'll probably never actually release anything.

Unfortunately, at first, you _will_ need to do everything. But just don't do everything really well. Instead, you'll have do some things crappily, and then move on. In addition to writing all the code, you'll need to: Create a project site. Explain your project. Write documentation. Do the releases. Start a bug tracker. Create a mailing list and answer questions. And you'll have to do most of these things poorly if you want to keep moving the ball forward.

You'll have to do some things poorly. But you'll need to pick a few things that you do really well and execute on those things. (The code should be one of the things you do well).

And everything you do, you'll need to make it easy for others to participate. To add patches, to update and create documentation, make bug reports and send patches. And make it clear that help is desired.

Don't get hung up on trying to make everything perfect. That just paralyzes you. But by picking a few things to do well, you will attract people to help you with the things you aren't doing well.

Community Wants to Help

Open Source is awesome in the way it attracts people who just want to help make something cool. Many people want to contribute their time, but only if they think their help will amount to something in the long run. They don't want to spend time and effort on something that doesn't yet show potential or might be abandoned if the creators lose interest.

If you have a solid codebase, then it becomes much easier to attract people to your community. If people can recognize there is at least something high quality about your project, but it's lacking in some areas, people will want to help you in those areas. But you have to have the high quality pieces in place. People don't want to be the one excellent contributor to dreck. They'd rather not have their efforts associated at all.

They do want to be a part of something great. They want to add their work and make it even better. They want to contribute to projects where the total excellence of the project reflects well on them and their efforts. They want to make the world a better place, and don't want their efforts wasted.

And people who like making the world a better place are exactly the kind of people you want to attract. You want people to have pride in their contributions, and to feel like they are really positively affecting the things they care about. Those people have lots of projects to choose where they can add their time and talents. If they feel their efforts on your project are wasted, they are gone. Make sure the people who show a strong desire to contribute aren't ignored, and feel like their efforts will eventually amount to something.

Being a part of Apache has helped CouchDB tremendously. Partially it's because Apache has helped our visibility and credibility. But it's also because we've adopted the "Apache Way", which is more focused on the community aspects of a project than on any specific contributor. Without our amazingly active community, CouchDB would be far behind where it is now.

Community Is Often Incompetent

Unfortunately, many people who will want to help you will produce contributions of poor quality. You will have deal with this "help", and do so diplomatically. The best way to point out the shortcomings with their contributions is to identify what needs improvement without denigrating their overall effort. This can be hard, and many don't want to hear why their efforts aren't up to the project's standards.

Sometimes you have to hurt peoples feelings. But it's better to be honest then to have the quality of your project brought down. If they can't handle the feedback, so be it. The good news is the people who do listen to constructive criticism and actually improve the quality of their contributions are incredibly valuable. Look for these people and nurture their involvement.

With CouchDB, we try to listen to all members of our community, but we only grant commit access to the ones who have shown high quality contributions. Our committers are our first line of defense against poor code and design.

Paul Graham Was Right

It seems to make sense to choose a mainstream language for your project. The more mainstream it is, the larger the potential community you can attract. While that's true to an extent, the quality of the community is more important than its absolute size. Much more important.

Using a mainstream language means you are also competing for contributor's time from other projects in the same language. So the pool is large, but in the end, you still have to attract quality developers from other things competing for their attention. And the competition might actually be stronger in that larger pool.

The more mainstream a language, the more likely it is that a random developer knows it because it's what they use at work. They aren't necessarily interested in being more productive, being more reliable, or whatever. They are interested in getting paid, and they choose their language not for elegance, power or performance, but for the number of job openings available.

If you pick a non-mainstream, more esoteric language, you tend to get a higher quality of developer. You tend to find people who absolutely love programming and building, and choose their languages not based on the scale of pay, but because they make the developers and projects more powerful. So while the total pool of contributors is smaller, they tend to give a higher quality of contribution. You get a much better signal to noise ratio.

As Paul Graham explained in Beating the Averages, the exotic languages tend to attract devs who love to learn and expand their toolbox. You'll attract more of the types of devs who don't mind creating new code to fill in the gaps, or diving into source to find a bug. They aren't afraid of what they don't know, they actually get excited by the chance to learn and do something new.

But if you pick enterprisey language X, you might find you are spending more of your time fixing problems and dealing with developers who just don't "get it". If you aren't careful, this can drown your project and bring the total code quality down to the point where you can't find good devs to help you anymore. With the less popular, esoteric languages, that tends to be less of a problem and you get a higher quality of contribution in general.

Use Your Brain

I can keep listing all the stuff we did, but you aren't creating the same project under the same circumstances. Pretty much everything I've said here, we've not followed at some point during the project. Often it was to the detriment of the project, but sometimes it just didn't make sense to blindly follow a rule or guideline.

You have a brain, and using it is the most important thing to remember at anytime. Projects can't follow cookie cutter rules. Even the "Apache Way", as I've discovered, means different things to different people, often at different times.

So take my advice here with a grain of salt, and use your brain to figure out what's actually important to you, your project and it's community. Good luck!

Link

July 25, 2010

CouchCamp is Coming Soon!

713448945.png

CouchCamp, September 8-10

This is the place to be to learn and hack on Apache CouchDB. In honor of the recent 1.0 release, for a limited time it's only $500, with accommodations.

In addition to unconference style discussions, we've got some great speakers: Selena Deckelman, Stuart Langridge, Ted Leung, Josh Berkus, Dion Almaer and me :)

One thing I'm really excited to talk about is our work porting CouchDB to mobile platforms. Android, iOS, RIM, etc. We've got some very cool stuff coming :)

Link