The tree behind Cloudant's documents and how to use it

On the surface, Cloudant's document API appears to be a reasonably simple way to upload JSON data to a key value store and update the JSON data for a particular key. However, the reality is more complicated -- isn't it always? -- and understanding what's happening is important in using Cloudant effectively and becoming an advanced user of the database.

The takeaway point is this: a document in Cloudant is a tree and Cloudant's document HTTP API is essentially a tree manipulation API.

Below, we'll create and manipulate these trees to get a feel for what this means. The default behaviours of many calls allow you to ignore the tree in order that Cloudant appears more like a database, and we'll explore why and how this works as we go.

Aside: why a tree? A document's tree representation is key to Cloudant's replication, which is essentially transferring tree nodes from one location to another in order to synchronise data robustly.

All this information is valid for CouchDB, but I've chosen to refer to Cloudant throughout for simplicity. This mini-tutorial assumes a basic knowledge of the Cloudant document API. Let's get started.

Building a simple document tree

Uploading a new document to Cloudant gives you something like this:

> curl -XPUT \
    'http://localhost:5984/tree-demo/mydoc' \
    -d '{"foo":"bar"}'
{"ok":true,"id":"mydoc","rev":"1-4c6114c65e295552ab1019e2b046b10e"}

Updating that document gives you this:

> curl -XPUT \
    'http://localhost:5984/tree-demo/mydoc?rev=1-4c6114c65e295552ab1019e2b046b10e' \
    -d '{"foo":"baz"}'
{"ok":true,"id":"mydoc","rev":"2-cfcd6781f13994bde69a1c3320bfdadb"}

Each call has returned a new rev ID in the response. If we didn't already know the two rev IDs for this document, we could find them out by making a request for the document using the revs=true querystring parameter:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc?revs=true' | jq .
{
  "_id": "mydoc",
  "_rev": "2-cfcd6781f13994bde69a1c3320bfdadb",
  "foo": "baz",
  "_revisions": {
    "start": 2,
    "ids": [
      "cfcd6781f13994bde69a1c3320bfdadb",
      "4c6114c65e295552ab1019e2b046b10e"
    ]
  }
}

The special thing about this request is that the response lists the rev IDs in order -- which defines the history of the document. From this call, we can construct a tree for the document with a single branch and two nodes. We use the _revisions field, and construct the tree backwards using the start and ids array:

A tree representation of the document, with two nodes

The tree has two nodes. Cloudant labels each node with its rev ID. Cloudant also labels one node winner; we'll come to that in a moment.

Note: Cloudant calls these tree nodes revisions but I'm going to stick with node to emphasise the tree semantics.

Now we've seen the tree, let's see how we can think about the three calls if we consider them in terms of trees rather updates to a JSON document.

  1. The first call creates a new document tree and its root node, 1-4c6114c65e295552ab1019e2b046b10e.
  2. The second call adds a node to the tree, parented by the node we pass in via the rev querystring parameter. The call returns the rev ID of the new node, 2-cfcd6781f13994bde69a1c3320bfdadb.
  3. The final call uses the revs=true parameter to return the history of the 2- node along with its content.

One other thing is worth mentioning at this point: a rev ID can be used to retrieve any node of the tree at any time. This is done by using the rev querystring parameter when making a GET request for the document. While that node will be returned, the JSON content of the document at that point may not: Cloudant cleans up the content of old tree nodes to conserve space.

The winner node

The winner abstraction is key to Cloudant's database semantics. It's used to give the appearance of a single value for a document rather than a tree. It does this by freeing us from having to specify the rev ID of the node we want with every call -- though some calls, like updates, always need one. Let's look at how this works.

Cloudant applies the winner label to the node at the tip of the tree as shown in the diagram above. This node is the one that Cloudant will refer to when responding to document retrieval requests that do not have any particular rev ID specified. The rev ID for this node is included in the response.

While the node that Cloudant chooses to label the winner for any given document is an implementation detail, it's essentially decided by choosing the longest path in the document tree that's not terminated by a deletion node. This will be more obvious later when we introduce a branch to the tree.

An essential property is that Cloudant chooses which node to label winner in a deterministic way. This means that when the database is replicated, Cloudant will label the same node as winner at all replicas, given the same document tree.

Consider the last call above:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc?revs=true'

By not specifying a rev ID to return, we're instructing Cloudant to return information about the tree node pointed to by the winner label.

Updates are tree manipulation

We've now seen that an update to a document can be viewed as adding a new node to the document's tree. And we have seen that the update request's rev parameter is specifying which existing node in the document tree should be used as the parent for this new node.

If we update twice more, we can see the tree grow. In the background, Cloudant will be updating the winner pointer to a new rev ID each time. It sends that back in the rev field of the response:

> curl -XPUT \
    'http://localhost:5984/tree-demo/mydoc?rev=2-cfcd6781f13994bde69a1c3320bfdadb' \
    -d '{"foo":"bop"}'
{"ok":true,"id":"mydoc","rev":"3-2766344359f70192d3a68bf205c37743"}

> curl -XPUT \
    'http://localhost:5984/tree-demo/mydoc?rev=3-2766344359f70192d3a68bf205c37743' \
    -d '{"foo":"bloop"}'
{"ok":true,"id":"mydoc","rev":"4-a5be949eeb7296747cc271766e9a498b"}

Retrieving the tree description again using revs=true, we can see this has changed the JSON description of the document tree history to include the new rev IDs (node labels):

> curl -XGET 'http://localhost:5984/tree-demo/mydoc?revs=true' | jq .
{
  "_id": "mydoc",
  "_rev": "4-a5be949eeb7296747cc271766e9a498b",
  "foo": "bloop",
  "_revisions": {
    "start": 4,
    "ids": [
      "a5be949eeb7296747cc271766e9a498b",
      "2766344359f70192d3a68bf205c37743",
      "cfcd6781f13994bde69a1c3320bfdadb",
      "4c6114c65e295552ab1019e2b046b10e"
    ]
  }
}

The tree now has four nodes, as shown by the _revisions array. As we didn't specify a rev in the request, Cloudant has again returned the node content and history for the winner node, as we'd expect. As with any direct request to retrieve a document, we could have supplied a rev parameter to receive information about a different node.

A tree representation of the document, with four nodes

Branching in the document tree

To show how the tree can branch, we need to make this document a conflicted document. A conflicted document is a document where there are two or more active branches. An active branch is one that ends in a node that is not deleted. This often happens during replication, where a document with a common history may have been updated in different ways in the databases being replicated.

If you've been using Cloudant for a while, you'll be familiar with Cloudant's behaviour when you send the "wrong" rev ID with an update: the 409 Conflict response. With our new tree knowledge, we can see that what Cloudant is doing is either (a) rejecting a request that tries to add a node to a parent which already has a child, or (b) trying to use a parent node that doesn't exist. It does this so the tree remains a single stem, which makes sense for a database. This can be seen if we try to add a second child to our 1- revision:

$ curl -XPUT \
    'http://localhost:5984/tree-demo/mydoc?rev=1-4c6114c65e295552ab1019e2b046b10e' \
    -d '{"foo":"baz"}' -v
[...]
> PUT /tree-demo/mydoc?rev=1-4c6114c65e295552ab1019e2b046b10e HTTP/1.1
[...]
< HTTP/1.1 409 Conflict
[...]
{"error":"conflict","reason":"Document update conflict."}

However, we can use the _bulk_docs call to bypass these protections (which is what Cloudant's replicator does). Clearly, this is something you'd never use day-to-day, but it's instructive to see the effects. In the following call, we show this bypassing in action by creating a new branch rooted at the first update we made above.

The _bulk_docs request takes a JSON body describing the updates to make. First, we create the body of the request in bulk.json:

{
    "docs": [
        {
            "_id": "mydoc",
            "_rev": "3-917fa2381192822767f010b95b45325b",
            "_revisions": {
                "ids": [
                    "917fa2381192822767f010b95b45325b",
                    "cfcd6781f13994bde69a1c3320bfdadb",
                    "4c6114c65e295552ab1019e2b046b10e"
                ],
                "start": 3
            },
            "bar": "baz"
        }
    ],
    "new_edits": false
}

This document describes a new node for the document mydoc that we created above, 3-917fa2381192822767f010b95b45325b. This node has a history that diverges from the existing document. The new node's history is described using the _revisions field, using the same format as we received in revs=true calls above. In this history, the first two rev IDs are the same, but the third is different. This will cause a branch to happen at rev 2-.

By including new_edits=false in the JSON, we tell Cloudant to graft this node into the document tree, creating any parent nodes required. Without new_edits=false, Cloudant would reject this update as the history is incompatible with the winner node.

Next, send the request using this body:

> curl -XPOST \
    'http://localhost:5984/tree-demo/_bulk_docs' \
    -d @bulk.json \
    -H "Content-Type:application/json"

We now have a tree which branches. To check this, we can make a request to Cloudant that will return the content of the nodes at the tip of each branch, along with the history of each of these nodes.

The open_revs parameter is specifically used for retrieving the content of multiple tip nodes at once. To return all the branches' tip nodes, we pass open_revs=all into the GET for the document. This is a second way to get non-winner nodes, along with specifying rev.

As above, we also include revs=true so that the history of each tip node is also returned. We also specify Accept: application/json to get a simpler return format: all the tip nodes in a JSON array.

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc?open_revs=all&revs=true' \
    -H "Accept:application/json" | jq .
[
  {
    "ok": {
      "_id": "mydoc",
      "_rev": "4-a5be949eeb7296747cc271766e9a498b",
      "foo": "bloop",
      "_revisions": {
        "start": 4,
        "ids": [
          "a5be949eeb7296747cc271766e9a498b",
          "2766344359f70192d3a68bf205c37743",
          "cfcd6781f13994bde69a1c3320bfdadb",
          "4c6114c65e295552ab1019e2b046b10e"
        ]
      }
    }
  },
  {
    "ok": {
      "_id": "mydoc",
      "_rev": "3-917fa2381192822767f010b95b45325b",
      "bar": "baz",
      "_revisions": {
        "start": 3,
        "ids": [
          "917fa2381192822767f010b95b45325b",
          "cfcd6781f13994bde69a1c3320bfdadb",
          "4c6114c65e295552ab1019e2b046b10e"
        ]
      }
    }
  }
]

Which represents the following tree:

A tree representation of the conflicted document, with a branch

Resolving the conflicted state

A document is conflicted so long as it has two or more branches which don't end in a deletion node. A conflicted document is clearly in a bad state for a database: some data is effectively hidden. So before we finish, let's resolve the conflict. In tree terms, resolving means updating the document's tree so that there is only one active branch.

As we have two active branches, we need to update one branch with a deletion node, called a tombstone, to make that branch inactive. We do this by making a DELETE request to the document's resource. This request specifies the parent rev ID (rev) as the node at the tip of the branch we wish to mark as a dead end.

Before this, we need to choose which branch to keep. To do this, we need to retrieve both. Instead of using open_revs=all, we can retrieve each individually.

First, let's look at what Cloudant considers the winner document node. This corresponds to the data it returns when a simple document lookup is carried out:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc' | jq .
{
  "_id": "mydoc",
  "_rev": "4-a5be949eeb7296747cc271766e9a498b",
  "foo": "bloop"
}

And, as stated earlier, we can get the other branch's data by specifying the rev ID of the node at the tip of the other branch:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc?rev=3-917fa2381192822767f010b95b45325b' | jq .
{
  "_id": "mydoc",
  "_rev": "3-917fa2381192822767f010b95b45325b",
  "bar": "baz"
}

Let's say that the non-winning node, 3-917fa2381192822767f010b95b45325b contains the JSON data we actually want for this document. We issue a delete request specifying the unwanted branch's tip node's rev ID as its parent:

> curl -XDELETE \
    'http://localhost:5984/tree-demo/mydoc?rev=4-a5be949eeb7296747cc271766e9a498b'
{"ok":true,"id":"mydoc","rev":"5-ab21cb5ac4c8da916c47c45330d8a655"}

Recapping what we're doing here, this request instructs Cloudant to add a tombstone node to the parent node specified by the rev parameter, thereby rendering that branch inactive.

Recall the algorithm to select the node the winner label points to is to choose the longest active path in a document tree. Therefore, after this operation, Cloudant will have moved the winner label to point at the tip of the remaining active branch. A lookup of the doc ID without specifying a rev ID therefore now returns the other node:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc' | jq .
{
  "_id": "mydoc",
  "_rev": "3-917fa2381192822767f010b95b45325b",
  "bar": "baz"
}

A corollary of this is that a document is not considered deleted -- that is, a query for that document will not return a 404 Not Found -- until all branches are inactive. This can be the source of great confusion if one doesn't know beforehand that a document is conflicted. Deleting the document appears to bring back to life another version of the document!

If a document has many active branches, it will take several requests to add a tombstone to every branch tip to make the document actually appear deleted. Using the open_revs=all argument will show how many active branch tips there are.

Returning to our example, we can study the resulting tree using the request that we've made a few times now:

> curl -XGET \
    'http://localhost:5984/tree-demo/mydoc?open_revs=all&revs=true' \
    -H "Accept:application/json" | jq .
[
  {
    "ok": {
      "_id": "mydoc",
      "_rev": "5-ab21cb5ac4c8da916c47c45330d8a655",
      "_deleted": true,
      "_revisions": {
        "start": 5,
        "ids": [
          "ab21cb5ac4c8da916c47c45330d8a655",
          "a5be949eeb7296747cc271766e9a498b",
          "2766344359f70192d3a68bf205c37743",
          "cfcd6781f13994bde69a1c3320bfdadb",
          "4c6114c65e295552ab1019e2b046b10e"
        ]
      }
    }
  },
  {
    "ok": {
      "_id": "mydoc",
      "_rev": "3-917fa2381192822767f010b95b45325b",
      "bar": "baz",
      "_revisions": {
        "start": 3,
        "ids": [
          "917fa2381192822767f010b95b45325b",
          "cfcd6781f13994bde69a1c3320bfdadb",
          "4c6114c65e295552ab1019e2b046b10e"
        ]
      }
    }
  }
]

The response still shows both branches. This is expected, as the branches still exist! However, we can see that the branch ending in 5- is deleted and so the 3- branch is the only active branch. Which therefore gives us this final tree:

A tree representation of the resolved document, with an active branch and a deleted branch

Conclusion

Hopefully, this has shown how the Cloudant API for documents is really a tree manipulation API. There convenience wrappers and behaviours for common operations which make the API more database-like, most of which make use of the tree node labelled winner.

It's useful to think this way, as it explains what effect requests are having on stored data which helps with understanding how to make Cloudant work effectively. It also helps to understand replication as the replication process is just using calls like the ones above to retrieve and duplicate document tree nodes from one database to another.

Before finishing, it's worth repeating that the content of tree nodes that are not at the tip of a branch is discarded periodically by Cloudant -- so don't rely on it sticking around!

Google Maglev

Google's Maglev paper describes a high-performance software-based load balancing system. Like many of the systems that Google has described in the literature, Maglev applies commodity hardware and horizontal scaling to a problem in a novel way. These are some notes on the parts most interesting to me.

Packets reaching Google's networks are destined for virtual IP addresses, which are assigned to particular services. These addresses map to several physical machines running the service application. Maglev's basic job is to look up the machines serving that virtual IP and then to evenly divide the load between these servers. Each Maglev array manages many virtual IPs and many backend servers per IP.

The main smarts in Maglev are in allowing an unintelligent frontend hardware router to distribute packets -- and packet fragments -- in any fashion it chooses to a resizeable array of Maglev instances which forward packets on to the machines handling each service and maintain connection semantics.

There were several problems that caught my attention:

  • Making something fast enough in software on commodity machines.
  • Ensuring even load balancing at the connection level.
  • Allowing a distributed load balancing tier to maintain connections between clients and backend machines.

Performance

The basics here are a combination of efficient coding and moving the networking work into user space. This speeds things up in a few ways. First, you avoid the context switches into kernel space. Second, the kernel's stack supports far more features than Maglev requires, so doing the work itself avoids a lot of possible code paths and checks. Finally, you can interact directly with the network card via shared memory buffers rather than be interrupt driven; when you're pretty sure there will always be packets waiting, polling is pretty efficient.

Balancing connections

Maglev is fortunate to be distributing the load from many different source IPs, so can achieve good balancing characteristics based on the source and destination data.

An obvious initial problem is that the router is distributing packets to multiple Maglev machines, but these packets form part of a larger connection stream, so they need to be routed to the same machine to allow complete requests to be reconstructed. The Maglev machines take care of routing packets belonging to the same connection to the same backend machine using a consistent hashing technique.

Backend selection is based on a packet's 5-tuple representation. This representation is the packet's source IP, source port, destination IP, destination port and IP version number. To avoid sharing state for ensuring consistent backend selection, a Maglev server takes the 5-tuple of an incoming packet and derives the appropriate backend for it using the hash of the 5-tuple. First, the Maglev machine checks a connection table for an existing entry. If that fails, a specialised consistent hashing technique allows each Maglev machine to choose the same backend server for a given connection. The selection is then stored in the connection table, for quicker directing of packets that are part of that connection.

Overall, the paper is a very readable solution to an interesting problem of scale.

How Money is Made

The Bank of England, I found out recently, produces a quarterly bulletin. The Bank published a couple of fascinating -- to me at least -- articles on how money is created and destroyed in the modern economy in their quarterly bulletin for Q1 2014:

  1. Money in the modern economy: an introduction
  2. Money creation in the modern economy

While other articles in the bulletin look to be fairly specialised, I found these opening two quite accessible. Mostly anyway, some paragraphs required a little more thinking than I was willing to put in for now.

At the crux of the articles is the theory, which the Bank says is the currently correct one, of how money is created. Surprisingly, money is created by commercial banks when they lend to you, me, companies and so on. Out of thin air, a higher number appears in our deposit accounts. On the bank's side, a corresponding lending account appears. As we pay back into that lending account, we are actively destroying money. So money is created from nothing and, mostly, returns to nothing.

This runs counter to the belief that banks lend out money that people have previously deposited; in fact, it runs exactly in reverse to that thought. Instead, lending comes first, banks creating money at the stoke of a pen and tap of a key.

My first thought was what, then, stops banks just creating more and more money, leading to runaway inflation? Competition and profit-motive, basically. Increasing a bank's lending typically comes at the cost of their decreasing the interest they charge on loans, which decreases the likely profit. And there is a balance between the interest paid to depositors by a bank and the interest they charge on loans, which has to be positive for the bank to have enough money to pay its employees. I did ponder whether a bank could just conjure up money for their employees, but realised that, of course, employees don't want to be paid in loans.

The interest rate set on reserves by the central bank -- that is, the money that commercial banks hold with the central bank -- also affects the price of loans on the street, to you and I. I get a bit lost at this part, so do email me if I've buggered this bit up. But so far as I can see we're broadly saying that the reserves which the commercial bank borrows from the central bank have an interest rate attached, which, given banks settle between themselves in reserve bank currency, influences the rate at which the bank is willing to loan that money further down the chain. Competition between banks drives down that rate towards the central bank rate, and the desire not to lose money keeps rates above it. By increasing the rate, the central bank can make lending more expensive, decreasing demand and therefore money creation.

Paying for something, in this world, really is the action of decreasing the number in someone's deposit account and increasing the number listed against someone else's. If the parties are with the same bank, this can be instantaneous. If not, it needs to be accompanied with a transfer in central bank reserves (I think) from the buyer's bank to the seller's bank.

There is a classic computer science example of the need to have transactional protection around a purchase, to keep the addition and subtraction appearing instantaneous, and in that we tend to think about the numbers as only representing people's balance. In reality, it seems to me that the bits on disk really are that balance -- and if they accidentally change, the money really does change. If that happens, the bank has, most likely, accidentally created or destroyed money.

I haven't quite figured out what this means. It's a bit like quantum physics, where you learn that reality is far less solid than everyday experience suggests. Similarly, money appears to be a far more mailable, slippery concept than I thought. I get the feeling that if my mind comes to accept this new view of money, a bunch of my underlying feelings about how economies -- and governments -- work will come tumbling down to be replaced with quite different ones.

Just open a blank document

When I start an app like Word by typing its name into Alfred, I almost never want to do anything other than open a new, blank document. Instead, by default, tons of apps show a "New Document" dialog, offering to help me by creating a generic looking flyer, brochure or todo list.

Here's how to go back to just the blank paper, spreadsheet, or whatever, at least for the apps I find myself requiring it:

  • Office:
    • Word: PreferencesGeneralShow Word Document Gallery when opening Word.
    • Excel: PreferencesGeneralOpen Workbook Gallery when opening Excel.
    • Powerpoint: PreferencesGeneralShow the Start screen when this application starts.
  • iWork: from Preferences you need to choose a default template to skip the template gallery for every new document. Skipping the open dialog box at startup requires some defaults work, see below.
  • OmniGraffle: the key I missed here is that this is a two-step process. Go to PreferencesGeneral, then: (1) select Create a new document if nothing else is open, and (2) select a template for the new document, I chose Auto-Resizing.

The Apple Stackexchange has the answer for turning off the open dialog box at start up that many apps on the Mac show; iWork, Text Edit and Calca are the ones where I see it most. From the command line:

defaults write -g NSShowAppCentricOpenPanelInsteadOfUntitledFile -bool false

If you want to show the dialog again:

defaults write -g NSShowAppCentricOpenPanelInsteadOfUntitledFile -bool true

If nothing else, writing this note prompted me to figure out whether you can disable that stupid dialog. Given it took all of 30 seconds to find, I'm rather saddened at the immense amount of time and bother that I have cost myself by not looking previously.

Limiting concurrent execution using GCD

Soroush Khanlou recently wrote The GCD Handbook, a cookbook for common Grand Central Dispatch patterns. It's a great set of patterns, with code examples.

While a great example of semaphores, I wondered whether the code in the Limiting the number of concurrent blocks section could be improved -- I'll explain why below. Soroush and I emailed back and forth a little bit, and came up with the following.

In the example, Soroush shows how to use GCD semaphores to limit the number of blocks that are concurrently executing in a dispatch queue. The key part is the enqueueWork function:

func enqueueWork(work: () -> ()) {
    dispatch_async(concurrentQueue) {
        dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
        work()
        dispatch_semaphore_signal(semaphore)
    }
}

The problem I saw here, which Soroush also notes, is that this approach starts a potentially unbounded number of threads, which are immediately blocked by waiting on a semaphore. Obviously GCD will limit you at some point, but that's still a lot of work and a decent chunk of memory. While this code is necessarily simplified to introduce this use of semaphores, the bunch of waiting threads needled at me.

To achieve effects like this with queue-based systems, I often find I need to combine more than one queue. Here, in the solution Soroush and I got to, we need two queues to get to a more efficient solution which only requires a single blocked thread.

We use a concurrent queue for executing the user's tasks, allowing as many concurrently executing tasks as GCD will allow us in that queue. The key piece is a second GCD queue. This second queue is a serial queue and acts as a gatekeeper to the concurrent queue. We wait on the semaphore in the serial queue, which means that we'll have at most one blocked thread when we reach maximum executing blocks on the concurrent queue. Any other tasks the user enqueues will sit inertly on the serial queue waiting to be executed, and won't cause new threads to be started.

import Cocoa

class MaxConcurrentTasksQueue: NSObject {

    private let serialq: dispatch_queue_t
    private let concurrentq: dispatch_queue_t
    private let sema: dispatch_semaphore_t

    init(withMaxConcurrency maxConcurrency: Int) {
        serialq = dispatch_queue_create("uk.co.dx13.serial", nil)
        concurrentq = dispatch_queue_create(
            "uk.co.dx13.concurrent", 
            DISPATCH_QUEUE_CONCURRENT)
        sema = dispatch_semaphore_create(maxConcurrency);
    }

    func enqueue(task: () -> ()) {
        dispatch_async(serialq) {
            dispatch_semaphore_wait(self.sema, DISPATCH_TIME_FOREVER);
            dispatch_async(self.concurrentq) {
                task();
                dispatch_semaphore_signal(self.sema);
            }
        }; 
    }

}

To test this, I created a Swift command line application with this code in main.swift:

import Foundation

let maxConcurrency = 5
let taskCount = 100
let sleepFor: NSTimeInterval = 2

print("Hello, World!")

let q = MaxConcurrentTasksQueue(withMaxConcurrency: maxConcurrency);
let group = dispatch_group_create()

for i in 1...100 {
    dispatch_group_enter(group);
    q.enqueue {
        print("Task:", i);
        if (sleepFor > 0) {
            NSThread.sleepForTimeInterval(sleepFor);
        }
        dispatch_group_leave(group);
    }
}

dispatch_group_wait(group, DISPATCH_TIME_FOREVER);

print("Goodbye, World");

Running this, Hello, World! should be printed, followed by Task: N in batches of five (or whatever you set maxConcurrency to), followed by a final Goodbye, World! before the application succumbs to the inevitability of termination.

Even here there is an interesting use of GCD. In order to stop the app terminating before the tasks have run, I needed to use dispatch groups. I hadn't really used these before, so I'll refer you to Soroush's explanation of dispatch groups at this point; the above is hopefully straightforward once you've read that.

I uploaded an example project for this code to my GitHub account. It's under Apache 2.0; hopefully it comes in handy sometime.