Erlang VM crash

Jan writes a call for help for some Erlang and Spidermonkey problems we've been having.

It's an interesting reason that Erlang crashes. I spent some time looking at the crash dump and the server didn't look to be in a stressed state. It only had 10's of processes running, most of them part of the core server that are always running. It was only using 12 megs memory total and asking for 2 meg more. How could it fail to allocate 2 megs and then crash because of it?

Turns out it's running in shared, hosted, memory limited environment and the Spidermonkey engine was hogging all the allowed OS memory. I imagine when Spidermonkey fails the OS allocation, it runs the GC. When Erlang fails an OS allocation, the whole VM bites the dust. The fact that it bites the dusts isn't as disturbing I originally thought, but it did surprise me, I'd expected it to reset all it's internal state rather than just exit the OS process.

So the Erlang VM died suddenly with a failed memory allocation. This is actually quite okay as CouchDB is designed with crash-only design principles and will instantly restart with no fix-up cycle necessary. So in the interest of crash-only design, this really is the way to go. Crash and restart.

However, the Erlang VM didn't restart automatically. Apparently we need to configure something for that to happen, only we can't figure out how to make it work. The documentation is either wrong, confusing or the feature is buggy. This is the kind of stuff we keep hitting in Erlang. It's fantastically productive for many tasks, but using some of the built-in libraries and features can be a huge time sink. (inets and xmerl immediately spring to mind). I love Erlang, I really do, but I have a nasty Erlang rant building inside of me.

Posted December 9, 2007 4:47 PM

Comments

Go Scala! :)

Alexandre, December 9, 2007 6:06 PM

I have a nasty Erlang rant building inside of me.

I look forward to seeing that; I suspect hearing it come from you will carry more weight than hearing it come from some of us.

I've recently gotten a job working with Erlang a significant portion of my time (working on the ejabberd codebase), and I've come to the conclusion that Erlang is one frustrating language. One minute you're expressing some horrible multithreaded problem so fantastically cleanly you can't imagine what other language you'd ever want to use, the next minute you're wondering why there's no library for X or trying to figure out how to do your string manipulation in some way that is both somewhat efficient and somewhat readable.

The fact that some very hard things are made so easy makes the hard things that should be easy so much more frustrating.

Strange, strange language, and I'm not talking about the process structure.

Jeremy Bowers, December 9, 2007 7:40 PM

I think you want the heartbeat flag: http://erlang.org/doc/man/heart.html I don' know if this actually works.

Arona Myous, December 9, 2007 9:50 PM

Arona, heartbeat is what we are trying to use. Jan's posting has the details.

Damien Katz, December 9, 2007 10:14 PM

I confirm that the heart command-line option is working well.
It allow you to restart your Erlang system immediately after a crash.

Mickael Remond, December 10, 2007 3:03 AM

Damien, you can contact me if you need help with heart. We use it often on production systems.

Mickael Remond, December 10, 2007 3:04 AM

@Mickael
Can you join #couchdb on irc.freenode.org for to help out?

Jan, December 10, 2007 7:18 AM

In case you need something external to Erlang to monitor/restart the process, Monit (http://tildeslash.com/monit/) would probably the simplest tool for the job.

Arto Bendiken, December 10, 2007 10:50 AM

The latest SpiderMonkey seems to have a number of gc related functions available: gc, gcparam, countHeap, gczeal, stackQuota, dumpHeap.

Also there was a memory leak when JS_C_STRINGS_ARE_UTF8. It is believed that this is addressed in the latest CVS HEAD, but that hasn't been verified.

Sam Ruby, December 10, 2007 3:22 PM

I got the following from a Mozilla developer:

Invoke js shell with -b option with a value of, say 1000000. This will
activate the branch callback and trigger periodic GC.

Sam Ruby, December 10, 2007 10:16 PM

@Sam
That did the trick, cheers.

Jan, December 12, 2007 3:58 AM

Sorry Jan, I do not have an IRC client (for quite obvious reason, I do not use IRC but XMPP network)

Mickael Remond, December 18, 2007 5:15 AM