A Comfortable Couch

Tuesday, September 28, 2004

Crash-only software revisited

I've been thinking about crash-only software and how it might have helped Technorati with their problem. But after thinking on it for a while, I don't think crash-only software is going to be a generally useful for a long time. It's only going to be good for systems that don't have dependencies on non-crash-only components.

Firstly, I think crash-only is the ideal way to develop reliable software. I like that it assumes software will crash, so it turns the problem on its head makes the crashing integral to the normal functioning of the application. Ironically, you are assured that you have crash tolerant systems because they are always crashed!

However, I see one big problem is with it: It doesn't work when interfacing with most non-crash-only software. Many APIs have Startup and Shutdown semantics and routines that must be followed by the applications that use them. For example, when using the Notes C API, you can crash and hang other Notes processes if your application doesn't call shutdown procedures cleanly.

One way to mitigate the crashing of your non-crash-only components it is to connect via network sockets or pipes. But even if your application uses a piped interface to non-crash-only software, it can still negatively affect the remote software if it doesn't disconnect cleanly. Most server software will deal correctly with broken connections, but many times it will wastes resources waiting for a timeout or thinking it's just idle. If the component system isn't designed for it the costs can add up under heavy loads, impacting response times or even crashing due to low resources.

With Couch this legacy interface problem doesn't look like it will be an issue. Since I'm building it from the ground up, I shouldn't have problems writing it crash-only. However, I can only pick prebuilt components that support crash-only and I think that's going to be slim pickins. That's alright, it just means I'll get to build those mostly low level components myself. That's the fun stuff to me anyway.

But I don't see how most new systems can be written crash-only if they interface with legacy components. Those crash-only systems will still need to go through a shutdown procedure for the benefit of those components. The best most applications can do is design the systems to be crash-only in design, but cleanly shutdown in practice. Unfortunately that's not really crash-only and isn't going be as reliable as a true crash-only system.


Post a Comment

<< Home