Just open a blank document

When I start an app like Word by typing its name into Alfred, I almost never want to do anything other than open a new, blank document. Instead, by default, tons of apps show a "New Document" dialog, offering to help me by creating a generic looking flyer, brochure or todo list.

Here's how to go back to just the blank paper, spreadsheet, or whatever, at least for the apps I find myself requiring it:

  • Office:
    • Word: PreferencesGeneralShow Word Document Gallery when opening Word.
    • Excel: PreferencesGeneralOpen Workbook Gallery when opening Excel.
    • Powerpoint: PreferencesGeneralShow the Start screen when this application starts.
  • iWork: from Preferences you need to choose a default template to skip the template gallery for every new document. Skipping the open dialog box at startup requires some defaults work, see below.
  • OmniGraffle: the key I missed here is that this is a two-step process. Go to PreferencesGeneral, then: (1) select Create a new document if nothing else is open, and (2) select a template for the new document, I chose Auto-Resizing.

The Apple Stackexchange has the answer for turning off the open dialog box at start up that many apps on the Mac show; iWork, Text Edit and Calca are the ones where I see it most. From the command line:

defaults write -g NSShowAppCentricOpenPanelInsteadOfUntitledFile -bool false

If you want to show the dialog again:

defaults write -g NSShowAppCentricOpenPanelInsteadOfUntitledFile -bool true

If nothing else, writing this note prompted me to figure out whether you can disable that stupid dialog. Given it took all of 30 seconds to find, I'm rather saddened at the immense amount of time and bother that I have cost myself by not looking previously.

Limiting concurrent execution using GCD

Soroush Khanlou recently wrote The GCD Handbook, a cookbook for common Grand Central Dispatch patterns. It's a great set of patterns, with code examples.

While a great example of semaphores, I wondered whether the code in the Limiting the number of concurrent blocks section could be improved -- I'll explain why below. Soroush and I emailed back and forth a little bit, and came up with the following.

In the example, Soroush shows how to use GCD semaphores to limit the number of blocks that are concurrently executing in a dispatch queue. The key part is the enqueueWork function:

func enqueueWork(work: () -> ()) {
    dispatch_async(concurrentQueue) {
        dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
        work()
        dispatch_semaphore_signal(semaphore)
    }
}

The problem I saw here, which Soroush also notes, is that this approach starts a potentially unbounded number of threads, which are immediately blocked by waiting on a semaphore. Obviously GCD will limit you at some point, but that's still a lot of work and a decent chunk of memory. While this code is necessarily simplified to introduce this use of semaphores, the bunch of waiting threads needled at me.

To achieve effects like this with queue-based systems, I often find I need to combine more than one queue. Here, in the solution Soroush and I got to, we need two queues to get to a more efficient solution which only requires a single blocked thread.

We use a concurrent queue for executing the user's tasks, allowing as many concurrently executing tasks as GCD will allow us in that queue. The key piece is a second GCD queue. This second queue is a serial queue and acts as a gatekeeper to the concurrent queue. We wait on the semaphore in the serial queue, which means that we'll have at most one blocked thread when we reach maximum executing blocks on the concurrent queue. Any other tasks the user enqueues will sit inertly on the serial queue waiting to be executed, and won't cause new threads to be started.

import Cocoa

class MaxConcurrentTasksQueue: NSObject {

    private let serialq: dispatch_queue_t
    private let concurrentq: dispatch_queue_t
    private let sema: dispatch_semaphore_t

    init(withMaxConcurrency maxConcurrency: Int) {
        serialq = dispatch_queue_create("uk.co.dx13.serial", nil)
        concurrentq = dispatch_queue_create(
            "uk.co.dx13.concurrent", 
            DISPATCH_QUEUE_CONCURRENT)
        sema = dispatch_semaphore_create(maxConcurrency);
    }

    func enqueue(task: () -> ()) {
        dispatch_async(serialq) {
            dispatch_semaphore_wait(self.sema, DISPATCH_TIME_FOREVER);
            dispatch_async(self.concurrentq) {
                task();
                dispatch_semaphore_signal(self.sema);
            }
        }; 
    }

}

To test this, I created a Swift command line application with this code in main.swift:

import Foundation

let maxConcurrency = 5
let taskCount = 100
let sleepFor: NSTimeInterval = 2

print("Hello, World!")

let q = MaxConcurrentTasksQueue(withMaxConcurrency: maxConcurrency);
let group = dispatch_group_create()

for i in 1...100 {
    dispatch_group_enter(group);
    q.enqueue {
        print("Task:", i);
        if (sleepFor > 0) {
            NSThread.sleepForTimeInterval(sleepFor);
        }
        dispatch_group_leave(group);
    }
}

dispatch_group_wait(group, DISPATCH_TIME_FOREVER);

print("Goodbye, World");

Running this, Hello, World! should be printed, followed by Task: N in batches of five (or whatever you set maxConcurrency to), followed by a final Goodbye, World! before the application succumbs to the inevitability of termination.

Even here there is an interesting use of GCD. In order to stop the app terminating before the tasks have run, I needed to use dispatch groups. I hadn't really used these before, so I'll refer you to Soroush's explanation of dispatch groups at this point; the above is hopefully straightforward once you've read that.

I uploaded an example project for this code to my GitHub account. It's under Apache 2.0; hopefully it comes in handy sometime.

More can be... more

"Less is more". It's a frustrating phrase. Less is not not more; by definition, it never can be. But sometimes less is better. On the other hand, sometimes more is better. Mostly, from what I can tell, there are some things where less is better and others where less is worse. Often there's a level which is just right: more is too much, and less is too little. Salt intake falls into that bucket.

Less is more is a paraphrasing of a whole book, The Paradox of Choice, written by Barry Schwartz. It's become a bit of a mantra, one that is deployed often without thought: "this page is cluttered, we need to remove stuff; less is more, dude". This closes a conversation without exploring the alternatives.

It's looking increasingly, however, like the idea is either false or, at the very least, that the book length discussion is more appropriate than the three word version.

Iyengar and Lepper's jam study from back in 2000, where all this stuff came from, is coming under fire: a whole bunch of other studies don't find the same effect. This is all described in more detail in an article on the Atlantic.

A logical conclusion of less is more is that one is enough. For shopping, at least, one appears to be too few. This relates to the topic of framing. My reading is that if you only have one item in a given category in a store, it's easy to worry that the price of that item is too high. Introduce even one more, and the pricing is framed. Add a more expensive item and you will increase sales of your previously lonely cheaper item: a framing effect will suddenly make your original item seem better value. The implication being that, without the price framing, customers will feel the urge to look elsewhere to be sure they have a good deal.

Looking online for articles that Schwartz has written, things seem to come back to this one from the HBR in 2006:

Choice can no longer be used to justify a marketing strategy in and of itself. More isn't always better, either for the customer or for the retailer. Discovering how much assortment is warranted is a considerable empirical challenge.

That is, less can be better, but sometimes more can be better; and it's expensive, but sometimes you just have to pay the price to find out.

Selecting a HAProxy backend using Lua

Once you've learned the basics of using Lua in HAProxy, you start to see a lot of places the scripting language could be useful. At Cloudant, one of the places we saw that we could make use of Lua was when selecting from the various backends to which our frontend load balancers direct traffic. We wrote a simple proof of concept, which I wanted to document here along with some of the problems we hit along the way.

Say we wanted to choose a backend based on the first component of the request path (i.e., a in /a/something/else). We actually don't do this at Cloudant, but it is a simple, not-quite-totally-trivial demo.

When using HAProxy 1.5, you'd do something like this:

frontend proxy
  ... other settings ...

  # del-header ensures that we're using 'new' headers
  http-request del-header x-backend
  http-request del-header x-path-first

  http-request set-header x-path-first path,word(1,/)

  acl is_backend_set hdr_len(x-backend) gt 0
  acl path_first_a %[req.hdr(x-path-first)] -m a
  acl path_first_b %[req.hdr(x-path-first)] -m b

  http-request set-header x-backend a if path_first_a !is_backend_set
  http-request set-header x-backend b if path_first_b !is_backend_set
  http-request set-header x-backend other if !is_backend_set

  http-request del-header x-path-first

  use_backend %[req.hdr(x-backend)]

backend a
  ...

backend b
  ...

backend other
  ...

In outline, this code uses a couple of temporary headers to store the first path component and the backend we choose, combined with ACLs as guards to make sure that the right ordering priority is used for backends. In particular, the is_backend_set ACL prevents us always using the other backend.

This is fairly concise, but in my experience gets complicated quickly. Moreover, it hides the fact that the logic is essentially an imperative if...else if...else statement.

Thankfully, HAProxy 1.6 introduces both variables and Lua scripting, which we can use to make things clearer and safer, if not particularly shorter.

Variables

We can use variables to replace the use of headers for temporary data. Setting and retrieving looks like this:

http-request set-var(req.path_first) path,word(1,/)
acl path_first_a %[var(req.path_first)] -m a

This isn't any shorter, but it does reduce the chance of a malicious request slipping in a header that affects processing.

Variables all have a scope: req variables are only available in HAProxy's request phase; res in the response phase; and txn are stored and available in both.

Lua

Variables are nice, but are a fairly straightforward feature. Lua allows us to get a bit more interesting. Instead of the header/acl dance, we can now write the backend-switching logic more explicitly.

Assuming that we put the Lua code in a file called select.lua alongside the HAProxy configuration file:

global
  lua-load select.lua
  ... other settings ...

frontend proxy
  ... other settings ...

  # Store the backend to use in a variable, available in both request
  # and response (txn-scope)
  http-request set-var(txn.backend_name) lua.backend_select()

  # Use the backend_name txn variable
  use_backend %[var(txn.backend_name)]

Here, we use a Lua sample fetch function. Sample fetch is a HAProxy term for any function -- whether in-built or written in Lua -- that processes the HTTP transaction and returns a value calculated using the transaction details. The Lua function is automatically passed details of the request as part of the transaction details.

The backend returned is put into a variable in case it's needed elsewhere. A txn scoped variable can be used in both request and response phases; using one, you could add a header to the response containing the chosen backend, for example. If this wasn't needed, you could put the backend_select fetch directly into the use_backend line.

Warning: One thing that we found when trying out this code is that we couldn't do what we used to and store the return value from the Lua code in a HTTP request header. If we did that, for some reason HAProxy returned a 503 status code, that is, the use_backend statement appeared to be trying to use a non-existent backend. Swapping to a variable fixed this.

The Lua code contained in select.lua ends up being straightforward:

-- Work out the backend name for a given request's HTTP path
core.register_fetches("backend_select", function(txn)

  # txn.sf contains HAProxy's in-built sample-fetches, like the HTTP path
  local path = txn.sf:path()
  local path_first = string.match(path, '([^/]+)')

  if path_first == 'a' then
    return 'a'
  elseif path_first == 'b' then
    return 'b'
  else
    return 'other'
  end
end)

In outline:

  1. core is a class exposed globally by HAProxy. One of the uses of core is to register Lua functions for use in HAProxy. The register_fetches call registers our sample fetch under the name backend_select. The sample fetch is a Lua function, declared inline in the call.
  2. The first part of the sample fetch function uses the txn argument. HAProxy provides this argument automatically to all Lua functions registered as sample fetches. The txn argument provides access to both the request context and a lot of the in-built HAProxy fetches for accessing data from the request. We use one of the fetches, path, to retrieve the path.
  3. We take the first part of the path using Lua's match function, which we can make perform a split-like behaviour.
  4. Finally, we can do the if/else statement and return the backend name to use.

For me, after learning the basics of Lua, the most complicated part of this was figuring out what's available on the txn variable. The Lua documentation directs you towards the standard HAProxy documentation, but I found it a bit hard to generate quite the right Lua code to access the fetches that HAProxy exposes (probably due to my unfamiliarity with terms like sample fetch when I started this proof of concept and that I'm new to Lua).

And there you have it. Once you get the right code, it's quite short, but it took a few days to figure out all the moving parts from scratch.

Ruby & Couch

It's a long weekend this week in the UK. I wanted to learn a bit more Ruby, so I decided to use the time to start writing a client library for CouchDB. Basically my day job at Cloudant, but in Ruby.

I first used Ruby back in about 2005, and this site was powered by a couple of Ruby incarnations: first a Ruby on Rails app for a time; then a fairly hokey static site generator. I think that lasted until around 2009 when I learned Python and switched to Google AppEngine. Even with this experience, I don't know Ruby particularly well -- I have never used it full time -- but I think the library has come out okay so far.

The client is is fairly low-level, which is my preference for clients, though not everyone's. One sets up a client, then makes requests with it. Each type of request -- GET _all_docs, PUT /database/document and so on -- is represented by its own class, an idea Soroush Khanlou calls templating. We also used this approach for Cloudant's Objective-C client library and it seemed a good approach; this Ruby library extends on lessons learned there.

require 'rubycouch'

client = CouchClient.new(URI.parse('http://localhost:5984'))
response = client.make_request(AllDbs.new)
response.json
# => ["_replicator","_users","animaldb",...]

It's got some neat features. Most things can be streamed rather than read into memory. I tried to pick something useful for each request, but aside from views, I ended up just providing the option to stream the data to a block.

However, some are a bit cleverer. I like the views implementation which sends each result to a block:

get_view = GetView.new('views101', 'latin_name')
client.database('animaldb').make_request(get_view) do |row, idx|
  # => 0: {"id"=>"kookaburra", "key"=>"Dacelo novaeguineae", "value"=>19}
  # and so on. `row` is always decoded JSON. idx just tends to be useful.
end.json
# => {"total_rows"=>5, "offset"=>0,"rows"=>[]}

I certainly learned a lot about Ruby writing this. Right now the library is pretty incomplete in terms of API coverage, but is quite usable for simple projects -- and importantly should be easy to add and contribute to. Perhaps I'll be able to take the time to polish it up. I hope I can. Meanwhile, it should be fairly simple to get to grips with if you want to try it.

Find it on GitHub.