Just enough YAML to understand Kubernetes manifests

When we talk about Kubernetes, we should really be talking about the fact that when you, as an administrator, interact with Kubernetes using kubectl, you are using kubectl to manipulate the state of data within Kubernetes via Kubernetes’s API.

But when you use kubectl, the way you tend to tell kubectl what to do with the Kubernetes API is using YAML. A lot of freakin’ YAML. So while I hope to write more about the actual Kubernetes API sometime soon, first we’ll have talk a bit about YAML; just enough to get going. Being frank, I don’t get on well with YAML. I do get on with JSON, because in JSON there is a single way to write anything. While you don’t even get to choose between double and single quotes for your strings in JSON, I overheard a colleague say that there are over sixty ways to write a string in YAML. Sixty ways to write a string! I think they were being serious.

Thankfully, idiosyncratic Kubernetes YAML doesn’t do much with the silly end of YAML, and even sticks to just three ways to represent strings 💪.

While not required for Kubernetes, while writing this I found some even more strange corners of YAML than I’d come across before. I thought I’d note these down for amusement’s sake even though I think they just come from over-applying the grammar rather than anyone seriously believing that they are sensible.

Below I’ve included YAML and the JSON equivalents, simply because I find JSON a conveniently unambiguous representation, and one that I expect to be familiar to most readers (including myself).

Objects (maps)

In JSON, you write an object like this:

"object": {
    "key": "value",
    "boolean": true,
    "null_value": null,
    "integer": 1,
    "anotherobject": {
        "hello": "world"
    }
}

Unlike JSON, in YAML the spacing makes a difference. We write a object like this:

object:
  key: value
  boolean: true
  null_value:
  integer: 1
  anotherobject:
    hello: world

If you bugger up the indenting, you’ll get a different value. So this YAML:

object2:
key: value
boolean: true
null_value:
integer: 1

Means this JSON:

"object2": null,
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1

When combined with a defacto standard of two-space indenting, I find YAML objects pretty hard to read. Particularly in a long sequence of objects, it’s very easy to miss where one object stops and another begins. It’s also easy to paste something with slightly wrong indent, changing its semantics, in a way that just isn’t possible in JSON.

You can actually just write an object with braces and everything in YAML, just like JSON. In fact JSON is a subset of YAML so any JSON document is also a YAML document. When I learned this it was essentially 🤯 combined with 😭. However, no-one ever writes JSON into YAML documents, so in the end this fact is purely academic.

Well apart from sometimes you see JSON arrays.

Arrays

Arrays look like (un-numbered) lists:

array1:
- mike
- fred

You can indent all the list items how you like, so this is the same:

array1:
        - mike
        - fred

Both translate to:

"array1": ["mike", "fred"]

But it’s easy to make a mistake. This YAML with its accidental indent:

array1:
  - mike
    - fred
  - john

Means this:

"array1": [
    "mike - fred",
    "john"
]

Which I find a bit too silently weird for my tastes.

Objects in arrays

The main thing I get wrong here is when writing arrays of objects. It’s very easy to misplace a -.

So this is a list of two objects:

array:
- key: value
  boolean: true
  null_value:
  integer: 1
- foo: bar
  hello: world

Which becomes:

"array": [
    {
        "key": "value",
        "boolean": true,
        "null_value": null,
        "integer": 1
    },
    {
        "foo": "bar",
        "hello": "world"
    }
]

But I find it very easy to miss the -, particularly in lists of objects with sub-objects. In addition, YAML’s permissiveness enables one to mistype syntactically valid but semantically different constructs, like here where we want to create an object but end up with an extra list item:

array:
- object:
- 	foo: bar
  	hello: world
  	baz: world
- key: value
  boolean: true
  null_value:
  integer: 1

Which gives the JSON:

"array": [
    {
        "object": null
    },
    {
        "foo": "bar",
        "hello": "world",
        "baz": "world"
    },
    {
        "key": "value",
        "boolean": true,
        "null_value": null,
        "integer": 1
    }
]

Particularly when reviewing complex structures, it’s easy to start to lose the thread of which - and which indent belongs to which object.

Arrays of arrays

I find this perhaps the best example of where YAML goes off the rails. It’s easy and (I find) clear to represent arrays of arrays in JSON:

[
    1,
    [1,2],
    [3, [4]],
    5
]

This is… pretty wild by default in YAML:

- 1
- - 1
  - 2
- - 3
  - - 4
- 5

I suspect this is reducing to the absurd for effect, however, perhaps the best thing here is to regress to inline JSON.

Strings

Anyway, let’s get back to those sixty ways to represent strings. The three ways you’ll commonly see used in Kubernetes manifest YAML files are as follows:

array:
- "mike"
- mike
- |
    mike

These all mean the same thing:

"array": [
    "mike",
    "mike",
    "mike"
]

The first form appears to actually always be a string. The second form is always a string – unless it’s a reserved word. The third form allows you to insert multiline strings as long as you indent appropriately. This third form is most seen in ConfigMap and Secret objects as it is very convenient for multi-line text files.

array:
- true
- "mike"
- |
  mike
  fred
  john
"array": [
    true,
    "mike",
    "mike\nfred\njohn\n"
],

A digression into wacky strings

Thankfully I’ve not seen them in Kubernetes YAML, but YAML contains at least two further forms that look remarkably similar to the | form. The first, which uses > to start it, only inserts newlines for two carriage returns, and for some reason (almost) always appears to insert a newline at the end. The second misses out a control character at the start of the string but looks identical in passing. In this variant the newlines embedded in the YAML disappear in the actual string.

In this example, I include the | form, the > and the prefix-less form using the same words and newline patterns to show how similar-looking YAML gives different strings:

array:
- |
  mike
  fred
  john
- >
  mike
  fred
  john
- mike
  fred
  john

Giving the JSON:

"array": [
    "mike\nfred\njohn\n",
    "mike fred john\n",
    "mike fred john"
],

I find the YAML definitely looks cleaner, but the JSON is better at spelling out what it means.

While experimenting, I find an odd edge case with the > prefix. Where I used it at the end of a file, the trailing \n ended up being dropped:

names: >
    mike
    fred
    john
names2: >
    mike
    fred
    john

Ends up with the \n going missing in names2:

"names": "mike fred john\n",
"names2": "mike fred john"

Just 🤷‍♀️ and move on.

Multiple documents in one file

Finally, you will often see --- in Kubernetes YAML files. All this means is that what follows the --- is the start of a new YAML object; it’s a way of putting multiple YAML objects inside one file. This is actually pretty nice, although again it’s pretty minimal and easy to miss when scanning a file.

And that’s about enough YAML to understand Kubernetes manifests 🎉.