A Comfortable Couch

Monday, August 30, 2004

What I'm building: Couch

I've code named my project Couch, which stands for Cluster Of Unreliable Commodity Hardware. The name doesn't really say what it does, but its such a cool acronym!

What I intend it to be is a clustered file server, that runs on cheap hardware, yet will be highly scalable and extremely reliable even in the face of multiple hardware failures. A typical installation will consist of many storage nodes (tens to hundreds), one or more map servers that know about what the storage nodes contain, and one or more clients that read and write data from the map server and storage nodes. A machine could be a storage node, map server and client at the same time.

The design goals are as follows, in order of importance:
  • Reliable. The system will be able to recover from multiple hardware failures seamlessly, without detection by the applications on the clients using the cluster.
  • No special programming necessary for applications. Applications need not know about the nature of file system, its looks like a standard local drive.
  • Scalable storage space. Add more storage nodes, and you get more storage. This should be able to scale to hundreds of nodes.
  • High performance reads. Pulling data from the storage system should be fast, limited mostly by network IO, particurlarly for streaming out a file, less so for random seeks.
  • Cross platform storage. The storage node software should be able to run on most platforms with minimal porting.
Non-goals are:
  • Security. The initial revision will assume that all machines (clients, maps and nodes) are trusted, all machines are expected to act cooperatively
  • High performance writes. I think the system will probably be slow for random seek writes, and acceptable for streamed in writes. The system should not be used to support database files, for example, but would be ideal to store movie or picture files.
  • Efficient use of storage space. The system will need to waste storage space for redundancy reasons. So adding a storage node with 500 GB of disk space will perhaps only get you 100 to 200 GB of actual usable storage.
  • Cross platform clients. The initial revision will probably support only Linux or Windows clients only, I'm not sure of which however.
Of course the goals and non-goals list will be adjusted as I begin implementing the system and get feedback from others.

I hope to have perty diagrams of an example Couch topology soon.


Anonymous said...

(BobB): This sounds like a very cool project, you'll have fun for sure.

Gotta love them "Non-goals", eh? That is SUCH a Lotus-ism (ok, Iris-ism for you), made me LOL

10:20 AM  
Ken Yee said...

This actually existed in commercial form some 10ish years ago. A company called Kendall Square Research (supercomputers) imploded. One of the spinoffs implemented a software version of their technology that let you store files on your system and the file was backed up on a number of other systems in chunks. Of course, I can't remember the name of it, but it really did exist :-P

6:06 PM  
Ken Yee said...

Oh wait...remembered finally ;-)


6:07 PM  
Anonymous said...

Google has one of them distributed file system things too


9:50 PM  
