CouchDB 2.0 (in preview) has clustering code contributed by Cloudant, which was inspired by Amazon’s Dynamo paper.
When using CouchDB in a cluster, databases are sharded and replicated. This means that a single database is split into, say, 24 shards and each shard is stored on more than one node (replica). A shard contains a specific portion of the documents in the database. A consistent hashing technique is used to allocate documents to shards. There are almost always three replicas; this provides a good balance of reliability vs. storage overhead.
The New York Times has more information on the often awful balance between editorial content and adverts in The Cost of Mobile Ads on 50 News Websites.
The difference was easy to spot: many websites loaded faster and felt easier to use. Data is also expensive. We estimated that on an average American cell data plan, each megabyte downloaded over a cell network costs about a penny. Visiting the home page of Boston.com every day for a month would cost the equivalent of about $9.50 in data usage just for the ads.
About three weeks ago I gave in: I turned on Firefox’s Tracking Protection feature. Last week I installed 1Blocker on my iPhone. Until now, I’d avoided ad- and tracker-blocking software. I felt uncomfortable hiding that which provided sites’ revenues. Looking under the hood at sites I regularly visit, however, I realise now that I’ve been a fool to hold out for so long.
I have two aims with both Tracking Protection and 1Blocker:
A few weeks ago, we switched an API in our synchronizing database for Android to use the builder pattern. You can see the implementation. The long and short is that our external API went from:
PullReplication pull = new PullReplication();
pull.source = /* remote database URL */;
pull.target = /* local database */;
Replicator pullReplicator = ReplicatorFactory.oneway(pull);
to the cleaner:
Replicator pullReplicator = ReplicatorBuilder.pull()
.from(/* remote database URL */)
.to(/* local database */)
.build();
The primary gain was that we reduced the API’s “surface area” significantly. We did this by going from having three classes whose names vaguely suggested their combined usage – PullReplication
, ReplicatorFactory
and Replicator
– to two classes whose names spell out a clear relationship: ReplicatorBuilder
and Replicator
.
A lot of times people talk about how face-to-face communication is high bandwidth, but let’s just say that in a lot of cases, that face-to-face communication can be a crutch. You can just throw bandwidth at the problem as opposed to actually using the bandwidth you have efficiently.
A thought-provoking point from an interview with Joe Mastey on the FogBugz blog.