May 31, 2006

Document Oriented Development

Two nights ago, I was editing the "So what? Who cares? Why would I ever want to use CouchDb?"* section on the home page of the CouchDb Wiki. As I was feebly trying to explain what CouchDb is good for, the words "document oriented application" popped into my head. I immediately liked it, it felt like I had a term to concisely describe the sorts of applications CouchDb is made for.

Today, I decided to Google the term "document oriented". Turns out it's not new, here's an article I found Towards truly document oriented Web services on the O'Reilly site. The article gives and example of a REST API that is similar to the one I will be exposing with CouchDb. Cool.

"Document Oriented Development" I think this may be a poorly served yet hugely important area of application development. Particularly in storage and management. For document storage, you pretty much have two options in mainstream development, direct file system access and relational databases.

Traditional file based systems are simple enough, this is how most PC applications have dealt with documents for a long time. MS Office is a prime example: all documents are files. But a lack of a reporting capabilities and concurrency control limit what can be done, particularly in web applications.

And relational databases? There is nothing "relational" about documents, yet the vast majority of document management systems are built on top of a RDBMS. but unless normalized to the 4th normal form, you'll need a fixed document schema, limiting flexibilty. But when normalized to 4th normal form, performance suffers. Badly. And not to mention SQL queries become unwieldy.

XML databases are meant to solve these sorts of problems. There is even a standardized query language for it: XQuery. XML databases are great if you want to think of everything in terms of XML. But from what I've seen, XML databases will simplify development only if your data is already XML. Even then, I'm not so sure.

It seem ridiculous there aren't more mainstream tools to deal with this style of development. Lotus Notes got so much of this right over 15 years ago, and it's still singularly unique in its capabilities.

Define It?

I'd like to come up with a good definition of document oriented development, but the idea is still pretty nascent in my brain. This is what I wrote on the wiki to describe the applications:

A typical document oriented application in the real world, if it weren't computerized, would consist mostly of actual paper documents. These documents would need to get sent around, edited, searched, photocopied, approved, pinned to the wall, filed away, etc. They could be simple yellow sticky notes or 10000 page legal documents. Not all document-oriented applications have real world counterparts.

The Wikipedia has a good definition of document:

A document contains information. It often refers to an actual products of writing and is usually intended to communicate or store collections of data. Documents are often the focus and concern of Administration.

Documents could be seen to include any discrete representation of meaning, but usually it refers to something like a physical book, printed page(s) or a virtual document in electronic/digital format.

Hmmm... getting closer.

docorienteddev.jpg
"Document Oriented Development" - By Ben Batchelder

Anyone want to take a crack at a definition at document oriented development? Or am I all wrong and there nothing particularly special about being "document oriented"?

* (that section heading, along with a bunch of others, was added by Jeff Atwood of Coding Horror. Thanks Jeff).

Link

May 30, 2006

Own your data

Ned Batchelder - Own your data: ad-hoc representations

Link

May 27, 2006

Regular expressionescence

One of the key features of Flan is its type system. You can have type-checking applied statically or dynamically, and you can make types more-or-less as restrictive or as loose as you want. In the end, it turns out that you can use this one system to do Prolog or Haskell-style data deconstruction, SQL-style filtering and pattern-matching, Eiffel-style compile-time contracts, and a really readable version of regular expressions.

Making Flan: Regular expressionescence

Erlang also has similar pattern matching facilities and it's one of the things I really love about it. From what I can tell, this looks to be far more powerful stuff. Not sure how well it will work in practice, but color me intrigued.

Link

May 25, 2006

CouchDb Wiki

I've created a wiki for the CouchDb project at couchdb.infogami.com. In its current form it's woefully inadequate but I'm putting it out there anyway, partially just to shame myself to keep adding to it.

I'm not sure how this whole wiki thing works in practice, so I'm just going to jump in headfirst and encourage anyone interested to make edits and add questions, topics or pages you'd like to see populated. Feel free to ask forgiveness rather than permission, all edits are versioned so if I don't like it I can always revert it.

Update:
Several people have asked why I choose Infogami to host the wiki. Two reasons:


  1. 0 effort creation

  2. Easy to use

Basically it just worked and I didn't have to learn anything hard or spend time feeling confused.

Link

May 21, 2006

Towards safer, simpler computing

It's ironic Java's sandbox security model is designed to protect clients from unsafe/malicious code while its success is primarily on servers, where Java's security model is woefully inadequate. It can't protect against things like unbounded CPU, Memory or IO usage, things that cause serious server problems. Not to mention the security model complicates everything in the VM (class loaders, byte code verifiers, and doobledy gook plier frakers) even if you don't actually need any security.

I just had that conversation with a friend before he left for JavaOne. He's frustrated by a number of things in Java, which all go back to the needs of the security model - his point being that it has less relevance for a server side application. I'm running this application on a Smalltalk server, where arbitrary code could be loaded in at any time. Here's the catch though - only two people have permissions on the system. So in order to mount such a code loading attack, one of the two of us would have to do it. Hmm - seems unlikely.

Continuations and web apps

Instead of complex VM security models applied to general purpose languages, we'd be better off using domain specific languages that are by design limited to provably safe activities. I see a future more about simple and small special purpose languages that can be connected easily, and less about giant bloated do-it-all beasts like C# and Java.

Link

Writing process

I think I need a better writing process. I'm pretty sure my personal process resembles a bubble sort, works well enough for short writing but slow as hell as things get longer. But still it works as long as I stick it out through to completion. Which I often don't, I abandon stuff all the time, or frequently I just publish whatever I got and not worry so much about it. Lucky you.

I was going to write more about this but I decided not to worry so much about it. Meh.

Link

May 20, 2006

Fat, 40 and fired

Guardian Unlimited - Fat, 40 and fired

Link

May 17, 2006

The Fly

The fly thought he was Maverick in Top Gun, buzzing my head like a mini control tower, mocking me. Turns out he was more like Goose, splatted dead against some glass.

Or at least he would be if ever landed somewhere for more than a nanosecond. Goddamn fly.


thefly.JPG
Tee-hee! I totally made him knock over that lamp! -The Fly

Link

May 16, 2006

Welcome Tabblo!

Tabblo is a new type of photo sharing site for creating stories and adding style and formatting on your photo set, versus the tired Next/Previous slideshows that currently dominates. Very cool.

Check out Ned's tabblo of the Wild Birthday Cakes they've made over the years. The Myst Island cake is my favorite.

Link

May 15, 2006

Thanks Duffbert

I just got the book The Art of Computer Programming, Volume 4, Fascicle 4: Generating All Trees--History of Combinatorial Generation sent to me by Thomas Duff because of my I Like Trees post. Thanks Duffbert!

Link

Linspire Development Standardizes on Haskell

The OS team at Linspire, Inc. would like to announce that we are standardizing on Haskell as our preferred language for core OS development.

Linspire/Freespire Core OS Team and Haskell

I haven't used Haskell for development, so I can't say how wise this decision is technically. But I'm betting they attract a much better candidate pool, for no other reason than people who learn Haskell typically do it because they want to learn something new, they want to discover new ways of thinking. Contrast to most of the Java/C#/VB ravel who only learned the language to get or keep a job, and whose resumes will bury you if you post openings for those languages. (yeah, I'm making big sweeping generalizations, but you know I'm mostly right)

Link

May 14, 2006

Erlang vs. Java

erlangjava-2.JPG

Coming soon, a series of articles comparing the features and philosophies of the functional programming language Erlang with Java. Also featuring the art and illustration of Ben Batchelder.

Link

May 10, 2006

The Ten Commandments of Egoless Programming

Coding Horror: The Ten Commandments of Egoless Programming

Link

May 8, 2006

Signs You're a Crappy Programmer (and don't know it)

You know those crappy programmers who don’t know they are crappy? You know, they think they're pretty good, they spout off the same catch phrase rhetoric they've heard some guru say and they know lots of rules about the "correct" way to do things? Yet their own work seems seriously lacking given all the expertise they supposedly have? You don’t know any programmers like that? Come one, you know, the guys who are big on dogma but short on understanding. No, doesn’t sound familiar?

Then here are some signs you're a crappy programmer and don't know it:


  • Java is all you'll ever need.
  • You don't see the need for other languages, why can't everything be in Java? It doesn't bother you at all to see Python or Ruby code that accomplishes in 10 lines what takes several pages in Java. Besides, you're convinced new language features in the next release will fix all that anyway.(BTW, this can be almost any language, but right now the Java community seems most afflicted with this thinking)
  • "Enterprisey" isn't a punchline to you.
  • This is serious stuff dammit. "Enterprise" is not just a word, it's a philosophy, a way of life, a path to enlightenment. Anything that can be written, deployed or upgraded with minimal fuss is dismissed as a toy that won't "scale" for future needs. Meanwhile most of the real work in your office is getting done by people sending around Excel spreadsheets as they wait for your grand enterprise visions to be built.
  • You are adamantly opposed to function/methods over 20 lines of code.
  • (or 30 or 10 or whatever number of lines) Sorry, sometimes a really long function is just what's needed for the problem at hand. Usually shorter functions are easier to understand, but sometimes things are most simply expressed in one long function. Code should not be made more complex to meet some arbitrary standard.
  • "OMG! PATTERNS!"
  • Developers who actively seek to apply patterns to every coding problem are adding unnecessary complexity. Far from being something you look to add to your code, you should feel bad every time you are forced to code up another design pattern, it means you are doing busy work that makes things more complex and is of dubious benefit. But hey, your code has design patterns, and no one can take that from you.
  • CPU cycles are a precious commodity and your programming style and language reflects that belief.
  • There are plenty of problem domains where you have to worry a lot about CPU cycles (modeling/simulation, signal processing, OS kernels, etc), but you don't work in them. Like nearly every software developer, your biggest performance problems are all database and I/O related. The only effect of optimizing your code for CPU is to shave 2 milliseconds off the time to get to the next multi-second database query. Meanwhile your development has slowed to a crawl, you can't keep up with the rapidly evolving requirements and there are serious quality issues. But at least you’ll be saving lots of CPU cycles... eventually.
  • You think no function/method should have multiple return points.
  • I've hear this from time to time, and usually the reason given is because the code is easier analyze. According to whom? I find simple code easy to analyze, and it often simplifies the code to have multiple returns.
  • Your users are stupid. Really stupid.
  • You can't believe how stupid they are, they constantly forget how to do simplest things and often make dumb mistakes with your applications. You never consider maybe it's your application that’s stupid because you're incapable of writing decent software.
  • You take great pride in the high volume of code you write.
  • Being productive is good, unfortunately producing lots of lines of code isn't quite the same as being productive. Users never remark "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." Far from being productive, spewing out tons of crap code slows down other devs and creates a huge maintenance burden for the future.
  • Copy and paste is great, it helps you write decoupled code!
  • You defend your use of copy and paste coding with odd arguments about decoupling and removing dependencies, while ignoring the maintenance drag and bugs code duplication causes. This is called "rationalizing your actions".
  • You think error handling means catching every exception, logging it and continuing on.
  • That’s not error handling, that’s error ignoring and is the semantically equivalent to “on error next” in VB. Just because it got logged away somewhere doesn’t mean you’ve handled anything. Error handling is hard. If you don’t know exactly what to do in the face of a particular error, then let the exception bubble up to a higher level exception handler.
  • You model all your code in UML before you write it.
  • Enthusiastic UML modeling is typically done by those who aren’t strong coders, but consider themselves software architects anyway. Modeling tools appeal most to those who think coding can be done in conference room by manipulating little charts. The charts aren’t the design, and will never be the design, that’s what the code is for.
  • Your code wipes out important data.
  • You wrote some code that’s supposed to overwrite application files with new files, but it goes haywire and deletes a bunch of the user's important data files.

That last one, I did that. Just last year. Honest mistake and it never showed up in QE’s testing. Sometimes I’m crappy. Sometimes we all are. I have been guilty of most of the items on the list and still struggle with a few (especially premature optimization). So try not to take anything on the list too personally, but feel free to flame me anyway if it makes you feel better.

Link

May 7, 2006

Dinosaur Shocker

Schweitzer showed the slide to Horner. “When she first found the red-blood-cell-looking structures, I said, Yep, that’s what they look like,” her mentor recalls. He thought it was possible they were red blood cells, but he gave her some advice: “Now see if you can find some evidence to show that that’s not what they are.”

Smithsonian.com - Dinosaur Shocker

Link

May 1, 2006

Patent Demons

Visto Corp. of Redwood Shores, Calif., said Monday that it had won an infringement case against Seven Networks, also of California, and is now targeting RIM for infringing four patents, including three involved in the Seven action.

CBC News: RIM rejects patent-infringement allegations

I worked for Seven Networks on that lawsuit and the outcome makes me sad, angry and nauseous at the same time. Not that I was a deciding factor (highly unlikely) and I won't divulge any details about the suit (First rule of law club: don't talk about law club).

But I'm pretty sure I can say this: I was ritually abused deposed for 7 1/2 hours in the Devil's Lair by Satan himself. And two of his minions (some people call them "attorneys"). And a stenographer. And a videographer. But they weren't full time Satanists, they were just day laborers in the Devil's Army. The Devil's Lair is situated, surprisingly, in a lovely office park in Palo Alto. Being Satan pays well.

Yeah, I'm making with the ha ha, but really I believe these guys are tools of evil. Or just tools. Whatever.

Link

Disappearing Technology

To be truly successful, a complex technology needs to "disappear".
...
The first cars, in the early 1900s, were “mostly a burden and a challenge”, says Mr Corn. Driving one required skill in lubricating various moving parts, sending oil manually to the transmission, adjusting the spark plug, setting the choke, opening the throttle, wielding the crank and knowing what to do when the car broke down, which it invariably did. People at the time hired chauffeurs, says Mr Corn, mostly because they needed to have a mechanic at hand to fix the car, just as firms today need IT staff and households need teenagers to sort out their computers.

The Economist: Now you see it, now you don't

I love the car analogy. I'm a car guy. I used to have a '66 Ford Mustang, I did almost all the work on it myself (including some brake work that nearly killed me). Loved that car, I loved the burble of the 289 V8, I loved the simplicity, of knowing that I could fix almost anything on it myself. And I often did. Simple. To me.

But I also love my '98 Honda Accord. Under the hood it's infinitely more complex than my old Mustang. There are all sorts of sensors, wires and boxes hidden throughout, their function and purpose unknown to me. I can still work on it, sort of. I can hook up computers that tell me what component failed and then bolt on a new one. But deep down I don't know what's going on really, it's far more complicated.

And yet, the Honda is far simpler to me as an owner, I almost never have to think about maintenance or repairs. Months go by and my Honda just continues to run smoothly and get us where we need to go, without fuss or drama. Because Honda's engineers spent a zillion hours thinking about geeky, complicated things, I can drive years before needing a tune-up. As much as I loved my Mustang, ultimately I loved it because I'm a car geek and it let me be a car geek, not because it's a better car.

This is a point we "technologist" need pounded into our heads. As a developer, it's YOUR job to make complexity and aggravation go away. You are the geek, the engineer, the guy who's job it is to think about such things. It doesn't matter if your user is a grandmother viewing an online photo album, or a seasoned engineer using your nifty new web services API, your job is to take complexity and make it into something others don't have to think about. We don't make users smarter by teaching what we know, but by making it so they need to know less than we do, freeing them to focus on other things.

Link