One reason I chose Erlang

Meanwhile, back at Ericsson, some Erlang-based products that were already in progress when the "ban" went into effect came to market, including the AXD 301, an ATM switch with 99.9999999 percent reliability (9 nines, or 31 ms. downtime a year!), which has captured 11% of the world market. The AXD 301 system includes 1.7 million lines of Erlang: This isn't just some academic language.

Erlang in BYTE.com

I really should write more about Erlang (as I promised), but I've been busy actually writing CouchDb in it. Anyway, the above fact about the extreme reliability of a 1.7 LOC switch was one big reason I chose Erlang for CouchDb. I did it sort of on faith, not sure why I should use this language, only that it claimed to address problems that I couldn't figure out how to solve with regular software tools. And it seemed to walk the walk with real shipping products.

So I had high hopes about Erlang, and I must say I haven't been disappointed. Honestly, I was expecting something different. I'm not sure what I was expecting, but what I found surprised me big time, the Erlang OTP platform has completely changed the way I view reliable software construction. One thing I've found surprising was it's just as easy to create bugs in Erlang code as it is in other languages, it does not have a silver bullet to eliminate software defects. But its reliability doesn't come from being so much better about creating bug free code, but how well it tolerates bugs and faults.

In properly designed Erlang software, when a component has a problem, -- it hit an unexpected condition, runs out of memory, times-out, etc -- the component is restarted. If the problem continues, then the parent can be restarted. If necessary, the system can shut down the faulty component permanently, or continue restarting components until the entire application is restarted, essentially a fresh restart.

Basically it has built into it a notion that shit will happen, so let's not allow small problems to wreck our whole application. It goes a long long way towards addressing otherwise intractable problems like memory leaks, memory fragmentation, deadlocks and infinite recursion. Kill first, ask questions later might as well be the Erlang motto. And why not? You just respawn what you kill. And while it may sound hard to write software this way, it's actually far easier than you think, because there is very little error handling code cluttering up everything.

This incremental restart strategy is not possible in languages with mutable data and shared state threading models (such as Java, C# and just about every other mainstream programming language).

Posted June 13, 2006 12:50 AM