September 27, 2011

Become a Distributed Database Expert (or just look like one)

At Couchbase we are looking for experienced hackers to help us build the fastest, most reliable distributed database on the planet. You don't need to a be expert already, but you should be ready to learn the ins and outs of distribute database systems, including:

  • Distributed Systems
  • Systems Resource Management: io (disk, network), cpu, memory usage
  • Maximizing Throughput and Minimizing Latency
  • Functional programming
  • Systems Reliability
  • Network Programming
  • Profiling, Benchmarking and Optimization
  • Cluster and Network Topology
  • Replication and Logical Sync
  • Distributed Data modeling
  • Embedded and Mobile software

More info here: Or you can send your resume and qualifications to me here:


September 24, 2011

Re: Data sync

>On Sep 23, 2011, at 1:40 AM, XXXX XXXXX wrote:
>Hi Damien,
>Greeting from XXXXX XXXXXX;
>Im running a small company with history in the mobile enterprise space
>We are just about to get some seed funding to build sqllite sync
>technology for mobile devices;
>I came across CouchBase extremely cool;
>We are planning to offer some of same features;
>Offline access
>Smart sync
>Bandwidth optimisation
>It would be good to get any advice or pointers you might have in
>terms of building sync technology for mobile
>All the best,

Hello! I would say that mobile sync is a deceptively hard problem to get all the nice properties you want. I suggest you look at how Couchbase replication works and try to duplicate it, and ideally, try to interoperate with it.

Some of the properties you probably want:

Incremental replication - The ability to stop and restart replication and not lose all your progress. Vital in a mobile environment where connections are slow and flaky.

Concurrency -You want to be able to use the local and the remote the databases while it's getting sync'd/replicated, no global locking. So the app is usable at all times and syncing in the background.

Conflict management - You need plan for how you'll deal with and manage edit conflicts.

Partial replication - Having replicas that only hold a interesting subset of other replicas. Important when sharing a large data set, but mobile clients only need a portion of it.

Ad hoc Topology - Couchbase supports ad hoc topology, any machine can sync with any other machine without prior knowledge. This is much more flexible than a single centralized sync point or fixed topology. Though many deployments will only need a single sync point, often new ones will need to be added.

Schema upgrade - Couchbase is schemaless, so it's easy to add new field/properties without breaking things. If using a schema, it's difficult to upgrade remote clients when they have new data in older schemas, etc.

Security - the ability to refuse updates if the come from unauthorized sources.

Anyway, Couchbase and CouchDB has worked out these problems and is successful in production on millions of machines. It's not the only way to build a sync scheme, but it's one of the most successful.