January 18, 2013

Development Methodologies?

Hi Damien,


If I were to list projects as small, medium, and large or small to enterprise, what methodologies work across them? My thoughts are Agile works well, but eventually you'll hit a wall of complexity, which will make you wonder why you didn't see it many, many iterations ago. I don't know anyone at NASA or Space-X or DoD so I don't know what software methodology they use? Given your experience can you shed some light on it?



I don't really use a specific methodology, however I find it very useful to understand the most popular methodologies and when they are useful. Then it's helpful when you are at various stages of projects and know what kinds of approaches are helpful, and how you can apply them to your situation.

For example, I find Test Driven Design (TDD) very much overkill, but for a mature codebase I find lots of testing invaluable. Early in a codebase I find lots of tests very restrictive, I value the ability to quickly change a lot of code without also having to change a larger amount of tests. Early on, when I'm creating the overall architecture that everything else will hang on, and the code is small and design is plastic and I can keep it all in my head, I value being able to move very quickly. However, other developers may find TDD very valuable to think through the design and problems. I don't work like that. To each his own.

Blindly applying methodologies or even "best practices" is bad. For the inexperienced it's better than nothing, but it's not as good as knowledge of self and team, experience with a variety of projects and their stages, and good old-fashioned pragmatism.


January 17, 2013

Follow up to "The Unreasonable Effectiveness of C"

My post The Unreasonable Effectiveness of C generated a ton discussion on Reddit and Hacker News, nearly 1200 comments combined as people got in to all sorts of heated arguments. I also got a bunch of private correspondence about it.

So I'm going to answer some of the most common questions, feedback and misunderstandings it's gotten.

Is C the best language for everything?

Hell no! Higher level languages, like Python and Ruby, are extremely useful and should definitely be used where appropriate. Java has a lot of advantages, C++ does too. Erlang is amazing. Most every popular language has uses where it's a better choice.

But when both raw performance and reliability are critical, C is very very hard to beat. At Couchbase we need industrial grade reliability without compromising performance.

I love me some Erlang. It's very reliable and predictable, and the whole design of the language is about robustness, even in the face of hardware failures. Just because we experienced a crash problem in the core of Erlang shouldn't tarnish its otherwise excellent track record.

However it's not fast enough for our and our customers needs. This is key, the hard work to make our code as efficient and fast as possible in C now benefits our many thousands of Couchbase server deployments all over the world, saving a ton of money and resources. It's an investment that is payed back many, many times.

But for most projects the extra engineering cost isn't worth it. if you are building something that's only used by your organization, or small # of customers, your money is likely better spent on faster/more hardware than very expensive engineers coding, testing and debugging C code. There is a good chance you don't have the same economies of scale we do at Couchbase where the costs are spread over high # of customers.

Don't just blindly use C, understand its own tradeoffs and if it makes sense in your situation. Erlang is quite good for us, but to stay competitive we need to move on to something faster and industrial grade for our performance oriented code. And Erlang itself is written in C.

If a big problem was C code in Erlang, why would using more C be good?

Because it's easier to debug when you don't lose context between the "application" layer and the lower level code. The big problem we've seen is when C code is getting called from higher level code in the same process, we lose all the debugging context between the higher level code and the underlying C code.

So when we were getting these crashes, we didn't have the expertise and tooling to figure out what exactly the Erlang code was doing at the moment it crashed. Erlang is highly concurrent and many different things were all being executed at the same time. We knew it had something to do with the async IO settings we were using in the VM and the opening and closing of files, but exactly what or why still eluded us.

Also, we couldn't manifest the crash with test code, though we tried, making it hard to report the issue to Erlang maintainers. We had to run the full Couchbase stack with heavy load in order to trigger the crash, and it would often take 6 or more hours before we saw it. This made debugging problematic as we had confounding factors of our own in-process C code that also could have been the source of the crashes.

In the end, we found through code inspection the problem was Erlang's disk based sorting code, the compression options it was using, and the interaction with how Erlang closes files. When Erlang closed files with the compression option it would occasionally have a race condition low down in VM that would lead to a dangling pointer and a double-free. If we hadn't lost all the context between the Erlang user code and the underlying C code, we could have tracked this problem down much sooner. We would have had a complete C stacktrace of what our code was doing when the library code crashed, allowing us to narrow down very quickly the flawed C code/modules.

Why Isn't C++ a suitable replacement for C?

Often it is, but the problem with C++ you have to be very disciplined to use it and not complicate/obfuscate your code unnecessarily. It's also not as portable to as many environments (particularly embedded), and tends to have much higher compilation and build times, which negatively affects developer productivity.

C++ is also a complicated mess, so when you adopt C++ for its libraries and community, you have to take the good with the bad and weird to get the benefits. And there is a lot of disagreement what constitutes bad or weird. Your sane subset of the language is very likely to be at odds with others ideas of a sane subset. C has this problem to a much much smaller degree.

What about Go as a replacement for C?

Perhaps someday. Right now Go is far slower than C. It also doesn't give as good of control over memory since it's garbage collected. It's not as portable, and you also can't host Go code in other environments or language VMs, limiting what you can do with your code.

Go however has a lot of momentum and a very bright future, they've made some very nice and pragmatic choices in its design. If it continues to flourish I expect every objection I listed, except for the garbage collection, will eventually be addressed.

What about D as a replacement for C?

It's not there for the same reasons as Go. It's possible that someday it will be suitable, but I'm less optimistic about it strictly from a momentum perspective, it doesn't have a big backer like Google and doesn't seem to be growing very rapidly. But perhaps it will get there someday.

Is there anything else that could replace C?

I don't know a lot of what's out there on the horizon, and there are some efforts to create a better C. But for completely new languages as a replacement, I'm most hopeful and optimistic about Mozilla's Rust. It's designed to be fast and portable, callable from any language/environment much like C, with no garbage collection yet still safe from buffer overruns, leaks and race conditions. It also has Erlang style concurrency features built in.

But it's a very young and rapidly evolving language. The performance is not yet close to C. The syntax might be too foreign for the masses to ever hit the mainstream, and it may suffer the same niche fate as Erlang because of that.

However if Rust achieves its stated goals, C-like performance but safe with Erlang concurrency and robustness built in, it would be the language of my dreams. I'll be watching its progress very closely.

That's just, like, your opinion, man

Yes, my post was an opinion piece.

But I'm not new to this programming game. I've done this professionally since 1995.

I've coded GUIs using very high level languages likes HTML/Javascript and Visual Basic, and with lower level languages like Java, C and C++.

I've built a ton of backend code in C, C++ and Erlang. I've written in excess of 100k lines of C and C++ code. I've easily read, line by line, 300k lines of C code.

I've written a byte code VM in C++ that's been deployed on 100 million+ desktops and 100's of thousands of servers. I used C++ inheritance, templates, exceptions, custom memory allocation and a bunch of other features I thought were very cool at the time. Now I feel bad for the people who have to maintain it.

I've fixed many bugs in MySQL and its C++ codebase. I wrote the Enterprise Edition thread pooling and evented network IO feature that increases client scalability by over an order of magnitude.

Also created and wrote, from scratch, Apache CouchDB, including the storage engine & tail append btrees, incremental Map/Reduce indexer and query engine, master/master replication with conflict management, and the HTTP API, plus a zillion of small details necessary to make it all work.

In short, I have substantial real world experience in projects used by millions of people everyday. Maybe I know what I'm talking about.

So while most of what I wrote is my opinion and difficult to back up with hard data, it's born from being cut so many times with the newest and coolest stuff. My view of C has changed over the years, and I used to think the older guys who loved C were just behind the times. Now I see why many of them felt that way, they saw what is traded away when you stray from the simple and effective.

Think about the most widely used backend projects around and see how they are able to get both reliability and performance. Chances are, they are using plain C. That's not just a coincidence.

Follow me on Twitter for more of my coding opinions and updates on Couchbase progress.


January 8, 2013

The Unreasonable Effectiveness of C

For years I've tried my damnedest to get away from C. Too simple, too many details to manage, too old and crufty, too low level. I've had intense and torrid love affairs with Java, C++, and Erlang. I've built things I'm proud of with all of them, and yet each has broken my heart. They've made promises they couldn't keep, created cultures that focus on the wrong things, and made devastating tradeoffs that eventually make you suffer painfully. And I keep crawling back to C.

C is the total package. It is the only language that's highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs.

Other languages can get you to a working state faster, but in the long run, when performance and reliability are important, C will save you time and headaches. I'm painfully learning that lesson once again.

Simple and Expressive

C is a fantastic high level language. I'll repeat that. C is a fantastic high level language. It's not as high level as Java or C#, and certainly no where near as high level as Erlang, Python, or Javascript. But it's as high level as C++, and far far simpler. Sure C++ offers more abstraction, but it doesn't present a high level of abstraction away from C. With C++ you still have to know everything you knew in C, plus a bunch of other ridiculous shit.

"When someone says: 'I want a programming language in which I need only say what I wish done', give him a lollipop."

- Alan J. Perlis

That we have a hard time thinking of lower level languages we'd use instead of C isn't because C is low level. It's because C is so damn successful as an abstraction over the underlying machine and making that high level, it's made most low level languages irrelevant. C is that good at what it does.

The syntax and semantics of C is amazingly powerful and expressive. It makes it easy to reason about high level algorithms and low level hardware at the same time. Its semantics are so simple and the syntax so powerful it lowers the cognitive load substantially, letting the programmer focus on what's important.

It's blown everything else away to the point it's moved the bar and redefined what we think of as a low level language. That's damn impressive.

Simpler Code, Simpler Types

C is a weak, statically typed language and its type system is quite simple. Unlike C++ or Java, you don't have classes where you define all sorts of new runtime behaviors of types. You are pretty much limited to structs and unions and all callers must be very explicit about how they use the types, callers get very little for free.

"You wanted a banana but what you got was a gorilla holding the banana and the entire jungle."

- Joe Armstrong

What sounds like a weakness ends up being a virtue: the "surface area" of C APIs tend to be simple and small. Instead of massive frameworks, there is a strong tendency and culture to create small libraries that are lightweight abstractions over simple types.

Contrast this to OO languages where codebases tend to evolve massive interdependent interfaces of complex types, where the arguments and return types are more complex types and the complexity is fractal, each type is a class defined in terms of methods with arguments and return types or more complex return types.

It's not that OO type systems force fractal complexity to happen, but they encourage it, they make it easier to do the wrong thing. C doesn't make it impossible, but it makes it harder. C tends to breed simpler, shallower types with fewer dependencies that are easier to understand and debug.

Speed King

C is the fastest language out there, both in micro and in full stack benchmarks. And it isn't just the fastest in runtime, it's also consistently the most efficient for memory consumption and startup time. And when you need to make a tradeoff between space and time, C doesn't hide the details from you, it's easy to reason about both.

"Trying to outsmart a compiler defeats much of the purpose of using one."

- Kernighan & Plauger, The Elements of Programming Style

Every time there is a claim of "near C" performance from a higher level language like Java or Haskell, it becomes a sick joke when you see the details. They have to do awkward backflips of syntax, use special knowledge of "smart" compilers and VM internals to get that performance, to the point that the simple expressive nature of the language is lost to strange optimizations that are version specific, and usually only stand up in micro-benchmarks.

When you write something to be fast in C, you know why it's fast, and it doesn't degrade significantly with different compilers or environments the way different VMs will, the way GC settings can radically affect performance and pauses, or the way interaction of one piece of code in an application will totally change the garbage collection profile for the rest.

The route to optimization in C is direct and simple, and when it's not, there are a host of profiler tools to help you understand why without having to understand the guts of a VM or the "sufficiently smart compiler". When using profilers for CPU, memory and IO, C is best at not obscuring what is really happening. The benchmarks, both micro and full stack, consistently prove C is still the king.

Faster Build-Run-Debug Cycles

Critically important to developer efficiency and productivity is the "build, run, debug" cycle. The faster the cycle is, the more interactive development is, and the more you stay in the state of flow and on task. C has the fastest development interactivity of any mainstream statically typed language.

"Optimism is an occupational hazard of programming; feedback is the treatment."

- Kent Beck

Because the build, run, debug cycle is not a core feature of a language, it's more about the tooling around it, this cycle is something that tends to be overlooked. It's hard to overstate the importance of the cycle for productivity. Sadly it's something that gets left out of most programming language discussions, where the focus tends to be only on lines of code and source writability/readability. The reality is the tooling and interactivity cycle of C is the fastest of any comparable language.

Ubiquitous Debuggers and Useful Crash Dumps

For pretty much any system you'd ever want to port to, there are readily available C debuggers and crash dump tools. These are invaluable to quickly finding the source of problems. And yes, there will be problems.

"Error, no keyboard -- press F1 to continue."

With any other language there might not be a usable debugger available and less likely a useful crash dump tool, and there is a really good chance for any heavy lifting you are interfacing with C code anyway. Now you have to debug the interface between the other language and the C code, and you often lose a ton of context, making it a cumbersome, error prone process, and often completely useless in practice.

With pure C code, you can see call stacks, variables, arguments, thread locals, globals, basically everything in memory. This is ridiculously helpful especially when you have something that went wrong days into a long running server process and isn't otherwise reproducible. If you lose this context in a higher level language, prepare for much pain.

Callable from Anywhere

C has a standardized application binary interface (ABI) that is supported by every OS, language and platform in existence. And it requires no runtime or other inherent overhead. This means the code you write in C isn't just valuable to callers from C code, but to every conceivable library, language and environment in existence.

"Portability is a result of few concepts and complete definition"

- J. Palme

You can use C code in standalone executables, scripting languages, kernel code, embedded code, as a DLL, even callable from SQL. It's the Lingua Franca of systems programming and pluggable libraries. If you want to write something once and have it usable from the most environments and use cases possible, C is the only sane choice.

Yes. It has Flaws

There are many "flaws" in C. It has no bounds checking, it's easy to corrupt anything in memory, there are dangling pointers and memory/resource leaks, bolted-on support for concurrency, no modules, no namespaces. Error handling can be painfully cumbersome and verbose. It's easy to make a whole class of errors where the call stack is smashed and hostile inputs take over your process. Closures? HA!

"When all else fails, read the instructions."

- L. Lasellio

Its flaws are very very well known, and this is a virtue. All languages and implementations have gotchas and hangups. C is just far more upfront about it. And there are a ton of static and runtime tools to help you deal with the most common and dangerous mistakes. That some of the most heavily used and reliable software in the world is built on C is proof that the flaws are overblown, and easy to detect and fix.

At Couchbase we recently spent easily 2+ man/months dealing with a crash in the Erlang VM. We wasted a ton of time tracking down something that was in the core Erlang implementation, never sure what was happening or why, thinking perhaps the flaw was something in our own plug-in C code, hoping it was something we could find and fix. It wasn't, it was a race condition bug in core Erlang. We only found the problem via code inspection of Erlang. This is a fundamental problem in any language that abstracts away too much of the computer.

Initially for performance reasons, we started increasingly rewriting more of the Couchbase code in C, and choosing it as the first option for more new features. But amazingly it's proven much more predictable when we'll hit issues and how to debug and fix them. In the long run, it's more productive.

I always have it in the back of my head that I want to make a slightly better C. Just to clean up some of the rough edges and fix some of the more egregious problems. But getting everything to fit, top to bottom, syntax, semantics, tooling, etc., might not be possible or even worth the effort. As it stands today, C is unreasonably effective, and I don't see that changing any time soon.

Follow me on Twitter for more of my coding opinions and updates on Couchbase progress.