Erlang VM crash

Jan writes a call for help for some Erlang and Spidermonkey problems we've been having.

It's an interesting reason that Erlang crashes. I spent some time looking at the crash dump and the server didn't look to be in a stressed state. It only had 10's of processes running, most of them part of the core server that are always running. It was only using 12 megs memory total and asking for 2 meg more. How could it fail to allocate 2 megs and then crash because of it?

Turns out it's running in shared, hosted, memory limited environment and the Spidermonkey engine was hogging all the allowed OS memory. I imagine when Spidermonkey fails the OS allocation, it runs the GC. When Erlang fails an OS allocation, the whole VM bites the dust. The fact that it bites the dusts isn't as disturbing I originally thought, but it did surprise me, I'd expected it to reset all it's internal state rather than just exit the OS process.

So the Erlang VM died suddenly with a failed memory allocation. This is actually quite okay as CouchDB is designed with crash-only design principles and will instantly restart with no fix-up cycle necessary. So in the interest of crash-only design, this really is the way to go. Crash and restart.

However, the Erlang VM didn't restart automatically. Apparently we need to configure something for that to happen, only we can't figure out how to make it work. The documentation is either wrong, confusing or the feature is buggy. This is the kind of stuff we keep hitting in Erlang. It's fantastically productive for many tasks, but using some of the built-in libraries and features can be a huge time sink. (inets and xmerl immediately spring to mind). I love Erlang, I really do, but I have a nasty Erlang rant building inside of me.

Posted December 9, 2007 4:47 PM