Loading Kubernetes Types Into Go Objects

At Cloudant, we use GitOps to manage our Kubernetes workloads. One of the advantages of this approach is that we store fully-rendered Kubernetes manifests within GitHub for deployment to our Kubernetes clusters.

One thing that I often find myself doing is writing small one-off tools to answer questions about those manifests. For example, “by deployment, what is the CPU and memory resource allocation, and how much does that cost in terms of worker machine price?“. As a first approximation, this can be discovered by loading up all Deployment manifests from our GitOps repository, then processing their content to discover container resource requests and the number of replicas specified by each Deployment.

I write these ad hoc tools in Go. While I could create the appropriate struct definitions within each program for YAML deserializer to work with, it is time consuming to do the object mapping work in every application. I wanted to be able to use pre-created object mappings and load them up inside my applications.

For this, I looked to the Kubernetes Go client. While this client contains object mappings for the various standard Kubernetes resource types, it is designed for use when querying the Kubernetes API server rather than loading YAML files from disk. With a little digging into the client’s guts, you can make this work, however, and save yourself a bunch of time. In addition, any code you do write can be easily converted to using the output of the Kubernetes API server later because it is working with the same types.

Go packages for Kubernetes API types

k8s.io/api is the root package for the standard type objects, such as Deployment. One thing that tripped me up originally is that the types are not defined within that package itself, but instead in packages underneath that namespace. For example, Deployment is in the k8s.io/api/apps/v1 package. The sub-packages follow a pattern of k8s.io/api/GROUP/VERSION, where GROUP and VERSION can be found in the apiVersion of all Kubernetes resources.

Each version of Kubernetes has a tag within the source repository for the Go k8s.io/api package. This is a useful way to refer to the API version you require within your go.mod file, rather than the package version.

To use these object-mappings within your application, add a require to go.mod using the tag to pin the version:

require (
	k8s.io/api kubernetes-1.14.1
)

In this require line, kubernetes-1.14.1 is the Kubernetes API level. It’s a tag within the source repository for the Go package.

When go build is run, the entry in go.mod is replaced by a reference to the commit:

require (
	k8s.io/api v0.0.0-20190409021203-6e4e0e4f393b
)

Loading YAML into Go objects

Now we know the package for the types, we need to work out how to load YAML into these types. I found that the definitions within k8s.io/api didn’t play well with the usual YAML library that I use, gopkg.in/yaml.v2.

It turns out that the k8s.io/apimachinery package contains stuff to help with loading the YAML, and we can combine that with utilities in k8s.io/client-go to decode YAML read from disk into rich Go types.

Again, we use the git tags for given Kubernetes versions when writing requirements for these two packages into our go.mod file. Because we’re using semi-private implementation packages for Kubernetes, if the versions of each package don’t “match” with each other, my experience is that loading YAML will fail with various odd error messages.

To use these packages, we modify the go.mod for the application to further include references to k8s.io/apimachinery and k8s.io/client-go at the same API level as k8s.io/api:

require (
	k8s.io/api kubernetes-1.14.1
	k8s.io/apimachinery kubernetes-1.14.1
	k8s.io/client-go kubernetes-1.14.1
)

Again, these tags will be converted into commit references by go build:

require (
	k8s.io/api v0.0.0-20190409021203-6e4e0e4f393b
	k8s.io/apimachinery v0.0.0-20190404173353-6a84e37a896d
	k8s.io/client-go v11.0.1-0.20190409021438-1a26190bd76a+incompatible
)

The process of mapping YAML to objects

From k8s.io/apimachinery, we use a UniversalDeserializer to create a runtime.Decoder object which is able take YAML and deserialize it into the objects that k8s.io/api provides. The runtime.Decoder is generated using a Scheme object which needs to contain all the definitions for the resource types (schemas, perhaps?) that need to be deserialized. By default, the Scheme object is empty of definitions.

This is where k8s.io/client-go comes in. To avoid needing to load all the types ourselves, we usek8s.io/client-go/kubernetes/scheme#Codecs from k8s.io/client-go as the CodecFactory assigned to this field has all the standard Kubernetes types preloaded into its Scheme and provides a way to create a runtime.Decoder using that Scheme.

Once we have a runtime.Decoder, we use its Decode method to decode YAML buffers into Go objects. Decode returns a triple:

(Object, *schema.GroupVersionKind, error)

The Object is the decoded object, as a k8s.io/apimachinery runtime.Object. This is an interface that needs to be cast to the appropriate type from k8s.io/api in order to access the resource’s fields. The GroupVersionKind structure helps us to do that, as it fully describes the Kubernetes resource type:

if groupVersionKind.Group == "apps" &&
	groupVersionKind.Version == "v1" &&
	groupVersionKind.Kind == "Deployment" {

	// Cast appropriately
	deployment := obj.(*appsv1.Deployment)

	// And do something with it
	log.Print(deployment.ObjectMeta.Name)
}

Putting it all together

Once we have these packages ready for use, reading the YAML into objects is relatively simple.

  1. Load the YAML file into a buffer.
  2. Handle that most YAML manifests are in fact many YAML documents in a single file, separated via ---. A naive but effective way to do this is using strings.Split().
  3. Pass to k8s.io/apimachinery’s runtime.Decoder which loads Kubernetes objects into runtime.Object which is an interface for all API types.
  4. Once loaded, figure out from the GroupVersionKind returned by Decode what Kubernetes resource kind we have and cast appropriately.

This code brings this together:

import (
	"fmt"
	"io/ioutil"
	"log"
	"path"
	"strings"

	appsv1 "k8s.io/api/apps/v1"
	"k8s.io/client-go/kubernetes/scheme"
)

func main() {

	// Load the file into a buffer
	fname := "/path/to/my/manifest.yaml"
	data, err := ioutil.ReadFile(fname)
	if err != nil {
		log.Fatal(err)
	}

	// Create a runtime.Decoder from the Codecs field within
	// k8s.io/client-go that's pre-loaded with the schemas for all
	// the standard Kubernetes resource types.
	decoder := scheme.Codecs.UniversalDeserializer()

	for _, resourceYAML := range strings.Split(string(data), "---") {

		// skip empty documents, `Decode` will fail on them
		if len(resourceYAML) == 0 {
			continue
		}

		// - obj is the API object (e.g., Deployment)
		// - groupVersionKind is a generic object that allows
		//   detecting the API type we are dealing with, for
		//   accurate type casting later.
		obj, groupVersionKind, err := decoder.Decode(
			[]byte(resourceYAML),
			nil,
			nil)
		if err != nil {
			log.Print(err)
			continue
		}

		// Figure out from `Kind` the resource type, and attempt
		// to cast appropriately.
		if groupVersionKind.Group == "apps" &&
			groupVersionKind.Version == "v1" &&
			groupVersionKind.Kind == "Deployment" {
			deployment := obj.(*appsv1.Deployment)
			log.Print(deployment.ObjectMeta.Name)
		}
	}
}

Remaining work: loading extension types (CRDs)

I haven’t worked out how to add CRD types to this process yet, which is a gap because we use CRDs with our own operators to deploy more complicated parts of our stack. As yet, therefore, they are missed out of analysis from my ad hoc tools.

In the end, you have to call AddKnownTypes on a given Scheme to add types, such as in the code for Sealed Secrets. This means you either need to reference or copy in the type definitions for your types from appropriate packages – much like client-go does when registering the types from k8s.io/api.

But, as yet, I’ve not completed this final part of the story.

Summary

This pattern is particularly useful when you are deploying using a GitOps pattern and thus have full manifests stored outside of Kubernetes which you wish to analyse. I have used it in several places; indeed the reason I’m writing it up here is to capture the instructions as much for myself as for other people, and also to force myself to dig into the code further rather than just copy-paste code from other places.

Most of the discoveries for this post come from these client-go GitHub issues:

Python Packaging in 2020

For a long time, I’ve kind of existed with a barely-there understanding of Python packaging. Just enough to copy a requirements.txt file from an old project and write a Makefile with pip install -r requirements.txt. A few years ago, I started using pipenv, and again learned just-enough to make it work.

Over the past year, I became frustrated with this situation:

  • pipenv became increasingly hard to make work through upgrades and its kitchen-sink approach.
  • I started building docker images for Python applications, and understanding packaging in more detail became essential to build secure and performant images.

Last year (2019), I started to look at tools like poetry, which essentially start the whole process from scratch, including new dependency resolution and package-building code. When figuring out how to use these in Dockerfiles, I realised I needed to understand a bunch more about both packaging and virtual environments. The good news was this actually progressed a lot in the 2018-9 time frame. The bad news was that meant there was a lot to learn, and a bunch of stuff was out of date.

In the beginning, there was the source distribution

Until 2013, when PEP 427 defined the whl archive format for Python packages, whenever a package was installed via pip install it was always built from source via a distribution format called sdist. For pure-python files this wasn’t typically much of a problem, but for any packages making use of C extensions it meant that the machine where pip install was run needed a compiler toolchain, python development headers and so on.

This situation is more than a little painful. As PEP 427’s rationale states:

Python’s sdist packages are defined by and require the distutils and setuptools build systems, running arbitrary code to build-and-install, and re-compile, code just so it can be installed into a new virtualenv.

After PEP 427, packages could also be distributed as so-called binary packages or wheels.

At the time I started to see python binary packages, because I had never looked in depth into python packaging, I was confused and even somewhat alarmed by the term binary package, particularly as I was quite used to source distributions by 2013. But in general they are a big win:

  • For pure python packages the term is a slight misnomer as the wheel format is just about how the files are laid out inside an archive – typically these wheels will have a single .whl per python version they support, that will be named like Flask-1.1.1-py2.py3-none-any.whl, where none and any specify the python ABI version (for C extensions) and the target platform respectively. As pure python packages have no C extensions, they have no target ABI and platform, but will often have a python version requirement, though this example supports both python 2 and 3.
    • The tags, such as none, in filenames are defined in PEP 425.
  • For packages including C extensions which are linked to the Python C runtime during compilation, the name does make sense because the build process pre-compiles the extension into a binary, unlike in the sdist world where C extensions were compiled during package installation. This results in several different .whl files, as a separate .whl files must be created for each target system and python version. For example, cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whl is a package with binaries built against C Python 3.4, ABI level 3 for a Linux machine and processor architecture.

In the end, wheels provide a much simpler and more reliable install experience as every user is not forced to compile packages themselves, with all the tooling and security concerns inherent in that approach.

Stepping back to how wheels are built

Wheels soon started taking over the python packaging ecosystem, though there are still hold-outs even today that ship source packages rather than binary packages (often for good reasons).

However, all python packages were still defined via setup.py, an opaque standard that was defined purely by the distutils and setuptools source code. While there was now a binary standard for built packages, in practice there was only one way of building them. pip for example hardcoded the calls to setup.py into its pip wheel command, so using other build systems was very difficult, making implementation of them somewhat thankless tasks. Before poetry, it doesn’t look like anyone much attempted it.

The distutils module was shipped with Python, so it was natural that it came to be the defacto standard, and including a packaging tool was a good decision from the python maintainers. distutils wasn’t that easy to use on its own, however, so setuptools was built as a package to improve that. Over time, setuptools also grew to be somewhat gnarly itself.

Tools like flit were then created to tame the new complexity, and wrap distutils and setuptools in another layer – though flit is opinionated. Flit’s way of doing things became popular, but in the end it was still using distutils and setuptools under the hood (per this flit source code). Even so, flit became pretty popular as its workflow is simple and understandable. Indeed, generation of the files used by distutils happens behind the scenes so far as I can tell (I didn’t actually try flit out, so may have made some errors here).

Poetry and PEPs 517 & 518

In 2018 development of poetry started, at least per the earliest commits from the github repository. Poetry is an ambitious rebuild of python packaging pretty much from scratch. It’s able to resolve dependencies and build wheels without any use of distutils and setuptools. The main problem with poetry is that it needs to re-implement a lot of existing functionality that is already present in other tools like pip to be accepted into development and CI pipelines.

At a similar time, the python community came up with PEP 517 and 518.

  • PEP 517 (status Provisional, 2015-2018) is about a standard way to specify alternative build backends that pip can use when building wheels – for example, using Poetry or flit’s build engine rather than going directly to distutils. A build backend is a Python module with a standard interface that is used to take a python package source tree and spit out a wheel.
  • PEP 518 (status Provisional, 2016) works in tandem with PEP 517 and specifies a way for a tool like pip to know how to install the build backend specified by PEP 517 when pip is building packages. Specifically, it describes how to create an isolated python environment with just the needed requirements to build the package (that is, the packages to install the build backend, not the package’s dependencies).

Both PEPs 517 and 518 use a new file called pyproject.toml to describe their settings:

[build-system]
# Defined by PEP 518, what the build environment requires:
requires = ["poetry>=0.12"]
# Defined by PEP 517, how to kick off the build:
build-backend = "poetry.masonry.api"

Both poetry and flit work with pyproject.toml via its support for namespacing tool-specific settings. An example using poetry:

[tool.poetry]
name = "my-package"
version = "0.1.0"
description = "The description of the package"

[tool.poetry.dependencies]
python = "^3.7"
flask-hookserver = "==1.1.0"
requests = "==2.22.0"

While both PEPs 517 and 518 were started a while ago, it’s only from pip 19.1 (early 2019) that pip started supporting the use of build backends specified via PEP 517.

pip enters “PEP 517 mode” when pip wheel is called if pip finds a pyproject.toml file in the package it is building. When in this mode, pip acts as a build frontend, a term defined by PEP 517 for the application that is used from the command line and is making calls into a build backend, such as poetry. As a build frontend, the job for pip here is to:

  1. Create an isolated python environment.
  2. Install the build backend into this environment via the PEP 518 requirements (requires = ["poetry>=0.12"]).
  3. Get the package ready for building in this environment.
  4. Invoke the build backend, for example poetry, using the entrypoint defined by PEP 517 (build-backend = "poetry.masonry.api") within the created isolated environment.

The build backend then must create a wheel from the source folder or source distribution and put it in the place that pip tells it to.

For me, this seems like big news for projects like poetry that do a lot from scratch and end up with laundry lists of feature requirements to enable them to be integrated into full development and CI pipelines. If they can instead be ingrated into CI via existing tools like pip, then they are much easier to adopt in development for their useful features there, such as poetry’s virtual environment management features. In particular, both flit and poetry will use the information defined in their respective sections of pyproject.toml to build the wheel and requirement wheels just as they would on a developer’s machine (to an extent anyway, my experiments indicate poetry ignores its .lock file when resolving requirements).

In this way, PEPs 517 and 518 close the loop in allowing tools like poetry to concentrate on what they want to concentrate on, rather than needing to build out a whole set of functions before they can be accepted into developers’ toolboxes.

An example Dockerfile shows this in action, for building the myapp package into a wheel along with its dependencies, and then copying the app and dependency wheels into the production image and installing them:

# Stage 1 build to allow pulling from private repos requiring creds
FROM python:3.8.0-buster AS builder
RUN mkdir -p /build/dist /build/myapp
# pyproject.toml has deps for the `myapp` package
COPY pyproject.toml /build
# Our project source code
COPY myapp/*.py /build/myapp/
# This line installs and uses the build backend defined in
# pyproject.toml to build the application wheels from the source
# code we copy in, outputting the app and dependency wheels
# to /build/dist.
RUN pip wheel -w /build/dist /build

# Stage 2 build: copy and install wheels from stage 1 (`builder`).
FROM python:3.8.0-slim-buster as production-image
COPY --from=builder [ "/build/dist/*.whl", "/install/" ]
RUN pip install --no-index /install/*.whl \
    && rm -rf /install
CMD [ "my-package-script" ]

And this is what I now understand about the state of python packaging as we enter 2020. The future looks bright.

Kubernetes by Types

It’s relatively easy to find articles online about the basics of Kubernetes that talk about how Kubernetes looks on your servers. That a Kubernetes cluster consists of master nodes (where Kubernetes book-keeping takes place) and worker nodes (where your applications and some system applications run). And that to run more stuff, you provision more workers, and that each pod looks like its own machine. And so on.

But for me, I found a disconnect between that mental image of relatively clean looking things running on servers and the reams and reams of YAML one must write to seemingly do anything with Kubernetes. Recently, I found the Kubernetes API overview pages. Somehow I’d not really internalised before that the reams of YAML are just compositions of types, like programming in any class-based language.

But they are, because in the end all the YAML you pass into kubectl is just getting kubectl to work with a data model inside the Kubernetes master node somewhere. The types described in the Kubernetes API documentation are the building blocks of that data model, and learning them unlocked a new level of understanding Kuberentes for me.

The data model is built using object composition, and I found a nice way to discover it was to start from a single container object and build out to a running deployment, using the API documentation as much as I could but returning to the prose documentation for examples when I got stuck or, as we’ll see with ConfigMaps, when the API documentation just can’t describe everything you need to know.

Containers

This is our starting point. While the smallest thing that Kubernetes will schedule on a worker is a Pod, the basic entity is the Container, which encapsulates (usually) a single process running on a machine. Looking at the API definition, we can easily see what the allowed values are – for me this was the point where what had previously been seemingly arbitrary YAML fields started to slot together into a type system! Just like other API documentation, suddenly there’s a place where I can see what goes in the YAML rather than copy-pasting things from the Kubernetes prose documentation, tweaking it and then just having to 🀞.

Let’s take a quick look at some fields:

  • The most important thing for a Container is, of course, the image that it will run. From the Container API documentation, we can look through the table of fields within the Container and see that a string is required for this field.
  • The documentation also says that a name is also required.
  • Another field that crops up a lot in my copy-pasted YAML is imagePullPolicy. If we look at imagePullPolicy, we can see that it’s also a string but also the documentation states what the acceptable values are: Always, Never and IfNotPresent. If YAML allowed enums, I’m sure this would be an enum. Anyway, we can immediately see what the allowed values are – this is much easier than trying to find this within the prose documentation!
  • Finally, let’s take a look at volumeMounts, which is a little more complicated: it’s a field of a new type rather than a primitive value. The new type is VolumeMount and the documentation tells us that this is an array of VolumeMount objects and links us to the appropriate API docs for VolumeMount objects. This was the real moment when I stopped having to use copy-paste and instead was really able to start constructing my YAML – πŸ’ͺ!

The documentation is also super-helpful in telling us where we can put things. Right at the top of the Container API spec, it tells us:

Containers are only ever created within the context of a Pod. This is usually done using a Controller. See Controllers: Deployment, Job, or StatefulSet.

Totally awesome, we now know that we need to put the Container within something else for it to be useful!

So let’s make ourselves a minimal container:

name: haproxy
image: haproxy:2.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
  name: HAProxyConfigVolume  # References a containing PodSpec
  mountPath: /usr/local/etc/haproxy/
  readOnly: true

We can build all this from the API documentation – and it’s easy to avoid the unneeded settings that often come along with copy-pasted examples from random websites on the internet. By reading the documentation for each field, we can also get a much better feel for how this container will behave, making it easier to debug problems later.

Pods

So now we have our Container we need to make a Pod so that Kubernetes can schedule HAProxy onto our nodes. From the Container docs, we have a link direct to the PodSpec documentation. Awesome, we can follow that up to our next building block.

A PodSpec has way more fields than a Container! But we can see that the first one we need to look at is containers which we’re told is an array of Container objects. And hey we have a Container object already, so let’s start our PodSpec with that:

containers:
- name: haproxy
  image: haproxy:2.1.0
  imagePullPolicy: IfNotPresent
  volumeMounts:
    name: HAProxyConfigVolume  # References a containing PodSpec
    mountPath: /usr/local/etc/haproxy/
    readOnly: true

Now, we also have that VolumeMount object in our HAProxy container that’s expecting a Volume from the PodSpec. So let’s add that. The Volume API spec should help and from the PodSpec docs we can see that a PodSpec has a volumes field which should have an array of Volume objects.

Looking at the Volume spec, we can see that it’s mostly a huge list of the different types of volumes that we can use. Each of which links off to yet another type which describes that particular volume. One thing to note is that the name of the Volume object we create needs to match the name of the VolumeMount in the Container object. Kubenetes has a lot of implied coupling like that, it’s just something to get used to.

We’ll use a configMap volume (ConfigMapVolumeSource docs) to mount a HAProxy config. We assume that the ConfigMap contains whatever files that HAProxy needs. Here’s the PodSpec with the volumes field:

containers:
- name: haproxy
  image: haproxy:2.1.0
  imagePullPolicy: IfNotPresent
  volumeMounts:
    mountPath: /usr/local/etc/haproxy/
    name: HAProxyConfigVolume  # This name comes from the PodSpec
    readOnly: true
volumes:
- name: HAProxyConfigVolume
  configMap:
    name: HAProxyConfigMap  # References a ConfigMap in the cluster

So now what we have is a PodSpec object which is composed from an array of Container objects and and array of Volume objects. To Kubernetes, our PodSpec object is a “template” for making Pods out of — we further need to embed this object inside another object which describes how we want to use this template to deploy one or more Pods to our Kubernetes cluster.

Deployments

There are several ways to get our PodSpec template actually made into a running process on the Kubernetes cluster. The ones mentioned all the way back in the Container docs are the most common:

  • Deployment: run a given number of Pod resources, with upgrade semantics and other useful things.
  • Job and CronJob: run a one-time or periodic job that uses the Pod as its executable task.
  • StatefulSet: a special-case thing where Pods get stable identities.

Deployment resources are most common, so we’ll build one of those. As always, we’ll look to the Deployment API spec to help. An interesting thing to note about Deployment resources is that the docs have a new set of options in the sidebar underneath the Deployment heading – links to the API calls in the Kubernetes API that we can use to manage our Deployment objects. Suddenly we’ve found that Kubernetes has a HTTP API we can use rather than kubectl if we want — time for our πŸ€– overlords to take over!

Anyway, for now let’s keep looking at the API spec for what our Deployments need to look like; whether we choose to pass them to either kubectl or these new shiny API endpoints we just found out about.

Deployment resources are top-level things, meaning that we can create, delete and modify them using the the Kubernetes API — up until now we’ve been working with definitions that need to be composed into higher level types to be useful. Top level types all have some standard fields:

  • apiVersion: this allows us to tell Kubernetes what version of the API we are using to manage this Deployment resource; as in any API, different API versions have different fields and behaviours.
  • kind: this specifies the kind of the resource, in this case Deployment.
  • metadata: this field contains lots of standard Kubernetes metadata, and it has a type of its own, ObjectMeta. The key thing we need here is the name field, which is a string.

Specific to a deployment we have just one field to look at:

  • spec: this describes how the Deployment will operate (e.g., how upgrades will be handled) and the Pod objects it will manage.

If we click kubectl example in the API spec, the API docs show a basic Deployment. From this, we can see the values we need to use for apiVersion, kind and metadata to get us started. A first version of our Deployment looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: haproxy-load-balancer
spec:
  # TODO

Next we’ll need to look at the DeploymentSpec API docs to see what we need to put into there. From experience, the most common fields here are:

  • template: a PodTemplateSpec which contains a standard metadata field containing ObjectMeta (the same type as at the top-level of the Deployment!) and a spec field where we finally find place to put the PodSpec we made earlier. This field is vital, as without it the Deployment has nothing to run!
  • selector: this field works with the metadata in the template field to tell the Deployment’s controller (the code within Kubernetes that manages Deployment resources) which Pods are related to this Deployment. Typically it references labels within the PodTemplateSpec’s metadata field. The selector documentation talks more about how selectors work; they are used widely within Kubernetes.
  • replicas: optional, but almost all Deployments have this field; how many Pods should exist that match the selector at all times. 3 is a common value as it works well for rolling reboots during upgrades.

We can add a basic DeploymentSpec with three replicas that uses the app label to tell the Deployment what Pods it is managing:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: haproxy-load-balancer
spec:
  replicas: 3
  selector:
    matchLabels:
        app: haproxy
  template:
    metadata:
      labels:
        app: haproxy
    spec:
        # PodSpec goes here

Finally, here is the complete Deployment built from scratch using the API documentation. While I think it would be pretty impossible to get here from the API documentation alone, once one has a basic grasp of concepts like “I need a Deployment to get some Pods running”, reading the API docs alongside copy-pasting YAML into kubectl is most likely a really fast way of getting up to speed; I certainly wish I’d dived in to the API docs a few months before I did!

apiVersion: apps/v1
kind: Deployment
metadata:
  name: haproxy-load-balancer
spec:
  replicas: 3
  selector:
    matchLabels:
      app: haproxy
  template:
    metadata:
      labels:
        app: haproxy
    spec:
      containers:
      - name: haproxy
        image: haproxy:2.1.0
        imagePullPolicy: IfNotPresent
        volumeMounts:
          mountPath: /usr/local/etc/haproxy/
          name: HAProxyConfigVolume
          readOnly: true
        volumes:
      - name: HAProxyConfigVolume
        configMap:
          name: HAProxyConfigMap

ConfigMaps

For completeness, let’s get a trivial HAProxy configuration and put it inside a ConfigMap resource so this demonstration is runnable. The API documentation for ConfigMap is less helpful than we’ve seen so far, frankly.

We can see ConfigMap objects can be worked with directly via the API, as they have the standard apiVersion, kind and metadata fields we saw on Deployment objects.

HAProxy configuration is a text file, so we can see that it probably goes in the data field rather than the binaryData field, as data can hold any UTF-8 sequence. We can see that data is an object, but further than that there isn’t detail about what should be in that object.

In the end, we need to go and check out the prose documentation on how to use a ConfigMap to understand what to do. Essentially what we find is that the keys used in the data object are used in different ways based on how we are using the ConfigMap. If we choose to mount the ConfigMap into a container — as we do in the PodSpec above — then the keys of the data object become filenames within the mounted filesystem. If, instead, we set up the ConfigMap to be used via environment variables, the keys would become the variable names. So we need to know this extra information before we can figure what to put in that data field.

The API documentation often requires reading alongside the prose documentation in this manner as many Kubernetes primitives have this use-dependent aspect to them.

So in this case, we add a haproxy.cfg key to the data object, as the HAProxy image we are using by default will look to /usr/local/etc/haproxy/haproxy.cfg for its configuration.

apiVersion: v1
kind: ConfigMap
metadata:
    name: HAProxyConfigMap  # Match name in VolumeMount
data:
    haproxy.cfg: |
        defaults
            mode http

        frontend normal
            bind *:80
            default_backend normal

        backend normal
            server app webapp:8081  #Β Assumes webapp Service

Recall from Just enough YAML that starting an object value with a | character makes all indented text that comes below into a single string, so this ConfigMap ends up with a file containing the HAProxy configuration correctly.

Summary

So we now have a simple HAProxy deployment in Kubernetes which we’ve mostly been able to build from reading the API documentation rather than blindly copy-pasting YAML from the internet. We — at least I — better understand what’s going on with all the bits of YAML and it’s starting to feel much less arbitrary. I feel now like I might actually stand a chance of writing some code that calls the Kubernetes API rather than relying on YAML and kubectl. And what’s that code called? An operator! I’d heard the name bandied about a lot, but had presumed some black magic was involved — but nope, it’s just about calls that manipulate objects within the Kubernetes API using the types we’ve talked about above, along with about a zillion other ones, including ones you make up yourself! Obviously you need to figure out how best to manage the objects, but when all is said and done that’s what you are doing.

Anyway, hopefully this has de-mystified some more of Kubernetes for you, dear reader; as I mentioned understanding these pieces helped me go from a copy-paste-hope workflow towards a much less frustrating experience building up my Kubernetes resources.

Just enough YAML to understand Kubernetes manifests

When we talk about Kubernetes, we should really be talking about the fact that when you, as an administrator, interact with Kubernetes using kubectl, you are using kubectl to manipulate the state of data within Kubernetes via Kubernetes’s API.

But when you use kubectl, the way you tend to tell kubectl what to do with the Kubernetes API is using YAML. A lot of freakin’ YAML. So while I hope to write more about the actual Kubernetes API sometime soon, first we’ll have talk a bit about YAML; just enough to get going. Being frank, I don’t get on well with YAML. I do get on with JSON, because in JSON there is a single way to write anything. While you don’t even get to choose between double and single quotes for your strings in JSON, I overheard a colleague say that there are over sixty ways to write a string in YAML. Sixty ways to write a string! I think they were being serious.

Thankfully, idiosyncratic Kubernetes YAML doesn’t do much with the silly end of YAML, and even sticks to just three ways to represent strings πŸ’ͺ.

While not required for Kubernetes, while writing this I found some even more strange corners of YAML than I’d come across before. I thought I’d note these down for amusement’s sake even though I think they just come from over-applying the grammar rather than anyone seriously believing that they are sensible.

Below I’ve included YAML and the JSON equivalents, simply because I find JSON a conveniently unambiguous representation, and one that I expect to be familiar to most readers (including myself).

Objects (maps)

In JSON, you write an object like this:

"object": {
    "key": "value",
    "boolean": true,
    "null_value": null,
    "integer": 1,
    "anotherobject": {
        "hello": "world"
    }
}

Unlike JSON, in YAML the spacing makes a difference. We write a object like this:

object:
  key: value
  boolean: true
  null_value:
  integer: 1
  anotherobject:
    hello: world

If you bugger up the indenting, you’ll get a different value. So this YAML:

object2:
key: value
boolean: true
null_value:
integer: 1

Means this JSON:

"object2": null,
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1

When combined with a defacto standard of two-space indenting, I find YAML objects pretty hard to read. Particularly in a long sequence of objects, it’s very easy to miss where one object stops and another begins. It’s also easy to paste something with slightly wrong indent, changing its semantics, in a way that just isn’t possible in JSON.

You can actually just write an object with braces and everything in YAML, just like JSON. In fact JSON is a subset of YAML so any JSON document is also a YAML document. When I learned this it was essentially 🀯 combined with 😭. However, no-one ever writes JSON into YAML documents, so in the end this fact is purely academic.

Well apart from sometimes you see JSON arrays.

Arrays

Arrays look like (un-numbered) lists:

array1:
- mike
- fred

You can indent all the list items how you like, so this is the same:

array1:
        - mike
        - fred

Both translate to:

"array1": ["mike", "fred"]

But it’s easy to make a mistake. This YAML with its accidental indent:

array1:
  - mike
    - fred
  - john

Means this:

"array1": [
    "mike - fred",
    "john"
]

Which I find a bit too silently weird for my tastes.

Objects in arrays

The main thing I get wrong here is when writing arrays of objects. It’s very easy to misplace a -.

So this is a list of two objects:

array:
- key: value
  boolean: true
  null_value:
  integer: 1
- foo: bar
  hello: world

Which becomes:

"array": [
    {
        "key": "value",
        "boolean": true,
        "null_value": null,
        "integer": 1
    },
    {
        "foo": "bar",
        "hello": "world"
    }
]

But I find it very easy to miss the -, particularly in lists of objects with sub-objects. In addition, YAML’s permissiveness enables one to mistype syntactically valid but semantically different constructs, like here where we want to create an object but end up with an extra list item:

array:
- object:
- 	foo: bar
  	hello: world
  	baz: world
- key: value
  boolean: true
  null_value:
  integer: 1

Which gives the JSON:

"array": [
    {
        "object": null
    },
    {
        "foo": "bar",
        "hello": "world",
        "baz": "world"
    },
    {
        "key": "value",
        "boolean": true,
        "null_value": null,
        "integer": 1
    }
]

Particularly when reviewing complex structures, it’s easy to start to lose the thread of which - and which indent belongs to which object.

Arrays of arrays

I find this perhaps the best example of where YAML goes off the rails. It’s easy and (I find) clear to represent arrays of arrays in JSON:

[
    1,
    [1,2],
    [3, [4]],
    5
]

This is… pretty wild by default in YAML:

- 1
- - 1
  - 2
- - 3
  - - 4
- 5

I suspect this is reducing to the absurd for effect, however, perhaps the best thing here is to regress to inline JSON.

Strings

Anyway, let’s get back to those sixty ways to represent strings. The three ways you’ll commonly see used in Kubernetes manifest YAML files are as follows:

array:
- "mike"
- mike
- |
    mike

These all mean the same thing:

"array": [
    "mike",
    "mike",
    "mike"
]

The first form appears to actually always be a string. The second form is always a string – unless it’s a reserved word. The third form allows you to insert multiline strings as long as you indent appropriately. This third form is most seen in ConfigMap and Secret objects as it is very convenient for multi-line text files.

array:
- true
- "mike"
- |
  mike
  fred
  john
"array": [
    true,
    "mike",
    "mike\nfred\njohn\n"
],

A digression into wacky strings

Thankfully I’ve not seen them in Kubernetes YAML, but YAML contains at least two further forms that look remarkably similar to the | form. The first, which uses > to start it, only inserts newlines for two carriage returns, and for some reason (almost) always appears to insert a newline at the end. The second misses out a control character at the start of the string but looks identical in passing. In this variant the newlines embedded in the YAML disappear in the actual string.

In this example, I include the | form, the > and the prefix-less form using the same words and newline patterns to show how similar-looking YAML gives different strings:

array:
- |
  mike
  fred
  john
- >
  mike
  fred
  john
- mike
  fred
  john

Giving the JSON:

"array": [
    "mike\nfred\njohn\n",
    "mike fred john\n",
    "mike fred john"
],

I find the YAML definitely looks cleaner, but the JSON is better at spelling out what it means.

While experimenting, I find an odd edge case with the > prefix. Where I used it at the end of a file, the trailing \n ended up being dropped:

names: >
    mike
    fred
    john
names2: >
    mike
    fred
    john

Ends up with the \n going missing in names2:

"names": "mike fred john\n",
"names2": "mike fred john"

Just πŸ€·β€β™€οΈ and move on.

Multiple documents in one file

Finally, you will often see --- in Kubernetes YAML files. All this means is that what follows the --- is the start of a new YAML object; it’s a way of putting multiple YAML objects inside one file. This is actually pretty nice, although again it’s pretty minimal and easy to miss when scanning a file.

And that’s about enough YAML to understand Kubernetes manifests πŸŽ‰.

AirPods Pro: first impressions

I’ve been using a pair of AirPods Pro for just under a week now. I use headphones in three main environments, and up until now have used three separate pairs, each of which works best for that environment. As they combine true-wireless comfort, noise-cancelling, a high promise transparency mode and closed-backs, I wondered whether the AirPods Pro could possibly replace at least a couple of my existing sets. Here we go.

Commute. My go-to headphones for my commute were a pair of first-gen AirPods that I’ve had nearly three years. I walk my commute, so I like to be able to hear what’s going on around me on the street; the open backed AirPods work great for this. This is obviously a place where transparency mode comes into play. However, both the Sony and Bose pairs mentioned below have transparency modes that, well, just don’t feel transparent. They make it feel like the outside world is coming through water. The AirPods Pro, however, while they do seem to have minor trouble with siblants in spoken-word, feel much closer to super-imposing your audio on the surroundings than any other transparency mode I’ve used. It’s suprisingly close to the experience using the original AirPods. On top of this, you can obviously turn to noise-cancelling on busy streets rather than turning up the volume. These two combined are a game-changer; right now I’m not tempted to swap back.

In the office. The original AirPods are essentially useless in the office for blocking out chatter. So I’ve been using a pair of WI-1000X for a couple of years, which block out background chatter really well, especially when used with the foam tips they come with. However, here too the AirPods Pro still work okay even without foam tips, and the lack of neckband and wires are just as noticable an improvement as with my walk into the office. In addition, the AirPods Pro charging case is just easier to use than the somewhat fiddly charger of the WI-1000X. At the moment, I’m grabbing for the AirPods Pro in the office. They block out enough chatter and true-wireless is just way more comfortable.

Flying. For drowning out engine noise on flights, I have found the (wired) Bose Qc20 beat the WI-1000X (the reverse is true for office chatter, strangely). The noise-cancelling is better on the Bose pair, and they fit into a very small carrying pouch compared to the neckband-saddled WI-1000X; much easier to chuck into a bag. I would say the AirPods Pro have about the same noise cancelling effectiveness as the Sony headphones. I’ve yet to fly, so time will tell whether the convenience of the wireless headphones beats out the (likely) better noise cancelling of the Bose pair. I’ll certainly be taking both to try them out as I feel it’ll be a close call.

Overall I’ve been surprised by how close the AirPods Pro have come to replacing the three pairs I used previously. Time will tell how I end up settling long term, but Apple have hit a good balance with these headphones. I suspect the convenience of the true wireless, good-enough noise-cancelling and compact size may make these my go-to headphones most of the time. Oh, and they sound good enough too – but you’d expect that for the price.