Just enough YAML to understand Kubernetes manifests
When we talk about Kubernetes, we should really be talking about the fact that
when you, as an administrator, interact with Kubernetes using kubectl
, you
are using kubectl
to manipulate the state of data within Kubernetes via
Kubernetes’s API.
But when you use kubectl
, the way you tend to tell kubectl
what to do with
the Kubernetes API is using YAML. A lot of freakin’ YAML. So while I hope to
write more about the actual Kubernetes API sometime soon, first we’ll have talk
a bit about YAML; just enough to get going. Being frank, I don’t get on well
with YAML. I do get on with JSON, because in JSON there is a single way to write
anything. While you don’t even get to choose between double and single quotes
for your strings in JSON, I overheard a colleague say that there are over sixty
ways to write a string in YAML. Sixty ways to write a string! I think they
were being serious.
Thankfully, idiosyncratic Kubernetes YAML doesn’t do much with the silly end of YAML, and even sticks to just three ways to represent strings 💪.
While not required for Kubernetes, while writing this I found some even more strange corners of YAML than I’d come across before. I thought I’d note these down for amusement’s sake even though I think they just come from over-applying the grammar rather than anyone seriously believing that they are sensible.
Below I’ve included YAML and the JSON equivalents, simply because I find JSON a conveniently unambiguous representation, and one that I expect to be familiar to most readers (including myself).
Objects (maps)
In JSON, you write an object like this:
"object": {
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1,
"anotherobject": {
"hello": "world"
}
}
Unlike JSON, in YAML the spacing makes a difference. We write a object like this:
object:
key: value
boolean: true
null_value:
integer: 1
anotherobject:
hello: world
If you bugger up the indenting, you’ll get a different value. So this YAML:
object2:
key: value
boolean: true
null_value:
integer: 1
Means this JSON:
"object2": null,
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
When combined with a defacto standard of two-space indenting, I find YAML objects pretty hard to read. Particularly in a long sequence of objects, it’s very easy to miss where one object stops and another begins. It’s also easy to paste something with slightly wrong indent, changing its semantics, in a way that just isn’t possible in JSON.
You can actually just write an object with braces and everything in YAML, just like JSON. In fact JSON is a subset of YAML so any JSON document is also a YAML document. When I learned this it was essentially 🤯 combined with 😭. However, no-one ever writes JSON into YAML documents, so in the end this fact is purely academic.
Well apart from sometimes you see JSON arrays.
Arrays
Arrays look like (un-numbered) lists:
array1:
- mike
- fred
You can indent all the list items how you like, so this is the same:
array1:
- mike
- fred
Both translate to:
"array1": ["mike", "fred"]
But it’s easy to make a mistake. This YAML with its accidental indent:
array1:
- mike
- fred
- john
Means this:
"array1": [
"mike - fred",
"john"
]
Which I find a bit too silently weird for my tastes.
Objects in arrays
The main thing I get wrong here is when writing arrays of objects. It’s very
easy to misplace a -
.
So this is a list of two objects:
array:
- key: value
boolean: true
null_value:
integer: 1
- foo: bar
hello: world
Which becomes:
"array": [
{
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
},
{
"foo": "bar",
"hello": "world"
}
]
But I find it very easy to miss the -
, particularly in lists of objects with
sub-objects. In addition, YAML’s permissiveness enables one to mistype
syntactically valid but semantically different constructs, like here where we
want to create an object but end up with an extra list item:
array:
- object:
- foo: bar
hello: world
baz: world
- key: value
boolean: true
null_value:
integer: 1
Which gives the JSON:
"array": [
{
"object": null
},
{
"foo": "bar",
"hello": "world",
"baz": "world"
},
{
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
}
]
Particularly when reviewing complex structures, it’s easy to start to lose the
thread of which -
and which indent belongs to which object.
Arrays of arrays
I find this perhaps the best example of where YAML goes off the rails. It’s easy and (I find) clear to represent arrays of arrays in JSON:
[
1,
[1,2],
[3, [4]],
5
]
This is… pretty wild by default in YAML:
- 1
- - 1
- 2
- - 3
- - 4
- 5
I suspect this is reducing to the absurd for effect, however, perhaps the best thing here is to regress to inline JSON.
Strings
Anyway, let’s get back to those sixty ways to represent strings. The three ways you’ll commonly see used in Kubernetes manifest YAML files are as follows:
array:
- "mike"
- mike
- |
mike
These all mean the same thing:
"array": [
"mike",
"mike",
"mike"
]
The first form appears to actually always be a string. The second form is always
a string – unless it’s a reserved word. The third form allows you to insert
multiline strings as long as you indent appropriately. This third form is most
seen in ConfigMap
and Secret
objects as it is very convenient for multi-line
text files.
array:
- true
- "mike"
- |
mike
fred
john
"array": [
true,
"mike",
"mike\nfred\njohn\n"
],
A digression into wacky strings
Thankfully I’ve not seen them in Kubernetes YAML, but YAML contains at least two
further forms that look remarkably similar to the |
form. The first, which
uses >
to start it, only inserts newlines for two carriage returns, and for
some reason (almost) always appears to insert a newline at the end. The second
misses out a control character at the start of the string but looks identical in
passing. In this variant the newlines embedded in the YAML disappear in the
actual string.
In this example, I include the |
form, the >
and the prefix-less form using
the same words and newline patterns to show how similar-looking YAML gives
different strings:
array:
- |
mike
fred
john
- >
mike
fred
john
- mike
fred
john
Giving the JSON:
"array": [
"mike\nfred\njohn\n",
"mike fred john\n",
"mike fred john"
],
I find the YAML definitely looks cleaner, but the JSON is better at spelling out what it means.
While experimenting, I find an odd edge case with the >
prefix. Where I used
it at the end of a file, the trailing \n
ended up being dropped:
names: >
mike
fred
john
names2: >
mike
fred
john
Ends up with the \n
going missing in names2
:
"names": "mike fred john\n",
"names2": "mike fred john"
Just 🤷♀️ and move on.
Multiple documents in one file
Finally, you will often see ---
in Kubernetes YAML files. All this means is
that what follows the ---
is the start of a new YAML object; it’s a way
of putting multiple YAML objects inside one file. This is actually pretty
nice, although again it’s pretty minimal and easy to miss when scanning a file.
And that’s about enough YAML to understand Kubernetes manifests 🎉.