Antlr rocks

I'm in the middle of creating a formula language for CouchDb, I'm calling it Fabric (backcronym suggestions welcome). Its not going to be an exact clone of Notes Formula Language, but its going to be close.

Since I've already written a production formula engine, it should be a cake-walk, right? Well, when I rewrote the Notes formula engine, I didn't have to worry about a compiler, the engine ran off of a compiled byte code. Because the byte code is meant to be exclusively machine readable its much easier to deal with programmatically.

So for my new engine I have to write something I've never written before, a compiler. There are two ways to go about it, write it by hand, or use a parser-generator. Since this language has a pretty simple syntax, writing a compiler by hand would be pretty straightforward and the most optimizible. But I figured a good parser-generator should create fast code and the whole process of learning to use a parser-generator would be more useful in the long run.

After a little research and poking around, I decided on Antlr (its website was the pertiest). After a few days of my head reeling from trying to wrap my head around it, I thought I had made a bad choice, this tool was too complex to be useful. So I'd get frustrated and angry (words make Hulk mad! SMASH!!) and desperately consider abandoning it for another tool. But the other tools didn't seem any better (just different and usually less documented overall), so I'd come whimpering back to Antlr.

After a while of extremely slow going and lots of frustration, I've started to figure out the Antlr world view and now I'm finally making significant progress (I think it takes a while for my brain to rewire itself sufficiently). And I now see Antlr is far more powerful than I first thought, it provides all sorts of easy ways to analyze and transform abstract syntax trees, and the lexing/parsing code it generates is impressively optimized.

But while the logical design of the Antlr grammar machine is impressive, the syntax it uses isn't what I'd call elegant. Here's a little snippet of the rule syntax:

decl : ( TYPE ID )+
{ ## = #(#[DECL,"decl"], ##); }

And this isn't even close to how bad a single rule can look. The syntax is ugly at first, but it's an effective syntax and one that's beginning to grow on me.

Now that I understand the tool better, I'm starting to see new possibilities for its use. Obviously it's great for creating compilers and interpreters, but it's also useful for incorporating existing languages into your applications. For example, if I want to incorporate user-authored Javascript into an application, but I need to limit what those scripts can do, I can use Anltr and its free grammars to analyze the code and validate exactly which language features are being used. Not that I need to do that right now, but I see the possibility in the future.

Anyway, I'm pretty happy with Antlr at this point and soon I'll be writing about Fabric and its design philosophy.

Posted November 12, 2005 3:32 PM