At Cloudant, we use GitOps to manage our Kubernetes workloads. One of the advantages of this approach is that we store fully-rendered Kubernetes manifests within GitHub for deployment to our Kubernetes clusters.
One thing that I often find myself doing is writing small one-off tools to
answer questions about those manifests. For example, “by deployment, what is the
CPU and memory resource allocation, and how much does that cost in terms of
worker machine price?“. As a first approximation, this can be discovered by
loading up all Deployment
manifests from our GitOps repository, then processing
their content to discover container resource requests and the number of replicas
specified by each Deployment
.
I write these ad hoc tools in Go. While I could create the appropriate struct
definitions within each program for YAML deserializer to work with, it is time
consuming to do the object mapping work in every application. I wanted to be
able to use pre-created object mappings and load them up inside my applications.
For this, I looked to the Kubernetes Go client. While this client contains object mappings for the various standard Kubernetes resource types, it is designed for use when querying the Kubernetes API server rather than loading YAML files from disk. With a little digging into the client’s guts, you can make this work, however, and save yourself a bunch of time. In addition, any code you do write can be easily converted to using the output of the Kubernetes API server later because it is working with the same types.
k8s.io/api
is the
root package for the standard type objects, such as Deployment
. One thing that
tripped me up originally is that the types are not defined within that package
itself, but instead in packages underneath that namespace. For example,
Deployment
is in the k8s.io/api/apps/v1
package. The
sub-packages follow a pattern of k8s.io/api/GROUP/VERSION
, where GROUP
and
VERSION
can be found in the apiVersion
of all Kubernetes resources.
Each version of Kubernetes has a tag within the source repository for the Go
k8s.io/api
package. This is a useful way
to refer to the API version you require within your go.mod
file, rather than
the package version.
To use these object-mappings within your application, add a require
to
go.mod
using the tag to pin the version:
require (
k8s.io/api v0.19.0
)
In this require line, v0.19.0
refers to Kubernetes API level 1.19
. I’m
guessing that they chose not to use v1.19.0
as the Go module guidelines
specify API stability within major versions. By my experience, this internal
API changes quite often with version bumps, and so it makes sense to use the
v0.x
versioning to capture this expectation.
Now we know the package for the types, we need to work out how to load YAML into
these types. I found that the definitions within k8s.io/api
didn’t play well
with the usual YAML library that I use,
gopkg.in/yaml.v2
.
It turns out that the k8s.io/apimachinery
package contains stuff to help with loading the YAML, and we can combine that
with utilities in k8s.io/client-go
to
decode YAML read from disk into rich Go types.
Again, we use the git tags for given Kubernetes versions when writing
requirements for these two packages into our go.mod
file. Because we’re using
semi-private implementation packages for Kubernetes, if the versions of each
package don’t “match” with each other, my experience is that loading YAML
will fail with various odd error messages.
To use these packages, we modify the go.mod
for the application to further
include references to k8s.io/apimachinery
and k8s.io/client-go
at the same
API level as k8s.io/api
:
require (
k8s.io/api v0.19.0
k8s.io/apimachinery v0.19.0
k8s.io/client-go v0.19.0
)
From k8s.io/apimachinery
, we use a UniversalDeserializer
to create a
runtime.Decoder
object which is able take YAML and deserialize it into
the objects that k8s.io/api
provides. The runtime.Decoder
is generated using
a Scheme
object which needs to contain all the definitions for the resource
types (schemas, perhaps?) that need to be deserialized. By default, the Scheme
object is empty of definitions.
This is where k8s.io/client-go
comes in. To avoid needing to load all the
types ourselves, we usek8s.io/client-go/kubernetes/scheme#Codecs
from
k8s.io/client-go
as the CodecFactory
assigned to this field has all the
standard Kubernetes types preloaded into its Scheme
and provides a way to
create a runtime.Decoder
using that Scheme
.
Once we have a runtime.Decoder
, we use its Decode
method to decode
YAML buffers into Go objects. Decode
returns a triple:
(Object, *schema.GroupVersionKind, error)
The Object
is the decoded object, as a k8s.io/apimachinery
runtime.Object
. This is an interface that needs to be cast to the
appropriate type from k8s.io/api
in order to access the resource’s fields. The
GroupVersionKind
structure helps us to do that, as it fully describes the
Kubernetes resource type:
if groupVersionKind.Group == "apps" &&
groupVersionKind.Version == "v1" &&
groupVersionKind.Kind == "Deployment" {
// Cast appropriately
deployment := obj.(*appsv1.Deployment)
// And do something with it
log.Print(deployment.ObjectMeta.Name)
}
Once we have these packages ready for use, reading the YAML into objects is relatively simple.
---
. A naive but effective way to do this is using
strings.Split()
.k8s.io/apimachinery
’s runtime.Decoder
which loads
Kubernetes objects into runtime.Object
which is an interface for
all API types.GroupVersionKind
returned by Decode
what
Kubernetes resource kind
we have and cast appropriately.This code brings this together:
import (
"fmt"
"io/ioutil"
"log"
"path"
"strings"
appsv1 "k8s.io/api/apps/v1"
"k8s.io/client-go/kubernetes/scheme"
)
func main() {
// Load the file into a buffer
fname := "/path/to/my/manifest.yaml"
data, err := ioutil.ReadFile(fname)
if err != nil {
log.Fatal(err)
}
// Create a runtime.Decoder from the Codecs field within
// k8s.io/client-go that's pre-loaded with the schemas for all
// the standard Kubernetes resource types.
decoder := scheme.Codecs.UniversalDeserializer()
for _, resourceYAML := range strings.Split(string(data), "---") {
// skip empty documents, `Decode` will fail on them
if len(resourceYAML) == 0 {
continue
}
// - obj is the API object (e.g., Deployment)
// - groupVersionKind is a generic object that allows
// detecting the API type we are dealing with, for
// accurate type casting later.
obj, groupVersionKind, err := decoder.Decode(
[]byte(resourceYAML),
nil,
nil)
if err != nil {
log.Print(err)
continue
}
// Figure out from `Kind` the resource type, and attempt
// to cast appropriately.
if groupVersionKind.Group == "apps" &&
groupVersionKind.Version == "v1" &&
groupVersionKind.Kind == "Deployment" {
deployment := obj.(*appsv1.Deployment)
log.Print(deployment.ObjectMeta.Name)
}
}
}
I haven’t worked out how to add CRD types to this process yet, which is a gap because we use CRDs with our own operators to deploy more complicated parts of our stack. As yet, therefore, they are missed out of analysis from my ad hoc tools.
In the end, you have to call AddKnownTypes
on a given Scheme
to add types,
such as in the code for Sealed Secrets. This means you either
need to reference or copy in the type definitions for your types from
appropriate packages – much like client-go
does when registering the types
from k8s.io/api
.
But, as yet, I’ve not completed this final part of the story.
This pattern is particularly useful when you are deploying using a GitOps pattern and thus have full manifests stored outside of Kubernetes which you wish to analyse. I have used it in several places; indeed the reason I’m writing it up here is to capture the instructions as much for myself as for other people, and also to force myself to dig into the code further rather than just copy-paste code from other places.
Most of the discoveries for this post come from these client-go
GitHub issues:
require
in go.mod
.For a long time, I’ve kind of existed with a barely-there understanding of
Python packaging. Just enough to copy a requirements.txt
file from an old
project and write a Makefile
with pip install -r requirements.txt
. A few
years ago, I started using pipenv, and again learned just-enough to make it
work.
Over the past year, I became frustrated with this situation:
pipenv
became increasingly hard to make work through upgrades and its
kitchen-sink approach.Last year (2019), I started to look at tools like poetry, which essentially start the whole process from scratch, including new dependency resolution and package-building code. When figuring out how to use these in Dockerfiles, I realised I needed to understand a bunch more about both packaging and virtual environments. The good news was this actually progressed a lot in the 2018-9 time frame. The bad news was that meant there was a lot to learn, and a bunch of stuff was out of date.
Until 2013, when PEP 427 defined the whl
archive format for Python packages,
whenever a package was installed via pip install
it was always built from
source via a distribution format called sdist
. For pure-python files this
wasn’t typically much of a problem, but for any packages making use of
C extensions it meant that the machine where pip install
was run needed
a compiler toolchain, python development headers and so on.
This situation is more than a little painful. As PEP 427’s rationale states:
Python’s sdist packages are defined by and require the distutils and setuptools build systems, running arbitrary code to build-and-install, and re-compile, code just so it can be installed into a new virtualenv.
After PEP 427, packages could also be distributed as so-called binary packages or wheels.
At the time I started to see python binary packages, because I had never looked in depth into python packaging, I was confused and even somewhat alarmed by the term binary package, particularly as I was quite used to source distributions by 2013. But in general they are a big win:
.whl
per python version they support, that will
be named like Flask-1.1.1-py2.py3-none-any.whl
, where none
and any
specify the python ABI version (for C extensions) and the target platform
respectively. As pure python packages have no C extensions, they have no
target ABI and platform, but will often have a python version requirement,
though this example supports both python 2 and 3.
none
, in filenames are defined in PEP 425..whl
files, as a separate .whl
files must be created
for each target system and python version. For example,
cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whl
is a package with
binaries built against C Python 3.4, ABI level 3 for a Linux machine and
processor architecture.In the end, wheels provide a much simpler and more reliable install experience as every user is not forced to compile packages themselves, with all the tooling and security concerns inherent in that approach.
Wheels soon started taking over the python packaging ecosystem, though there are still hold-outs even today that ship source packages rather than binary packages (often for good reasons).
However, all python packages were still defined via setup.py
, an opaque
standard that was defined purely by the distutils
and setuptools
source
code. While there was now a binary standard for built packages, in practice
there was only one way of building them. pip
for example hardcoded the calls
to setup.py
into its pip wheel
command, so using other build systems was
very difficult, making implementation of them somewhat thankless tasks. Before
poetry, it doesn’t look like anyone much attempted it.
The distutils
module was shipped with Python, so it was natural that it came
to be the defacto standard, and including a packaging tool was a good decision
from the python maintainers. distutils
wasn’t that easy to use on its own,
however, so setuptools
was built as a package to improve that. Over time,
setuptools
also grew to be somewhat gnarly itself.
Tools like flit were then created to tame the new complexity,
and wrap distutils
and setuptools
in another layer – though flit is
opinionated. Flit’s way of doing things became popular, but in the end it was
still using distutils and setuptools under the hood (per this flit source
code). Even so, flit became pretty popular as its workflow is simple and
understandable. Indeed, generation of the files used by distutils
happens
behind the scenes so far as I can tell (I didn’t actually try flit out, so
may have made some errors here).
In 2018 development of poetry started, at least per the earliest commits from
the github repository. Poetry is an
ambitious rebuild of python packaging pretty much from scratch. It’s able to
resolve dependencies and build wheels without any use of distutils and
setuptools. The main problem with poetry is that it needs to re-implement a lot
of existing functionality that is already present in other tools like pip
to
be accepted into development and CI pipelines.
At a similar time, the python community came up with PEP 517 and 518.
pip
can use when building wheels – for
example, using Poetry or flit’s build engine rather than going directly to
distutils
. A build backend is a Python module with a standard interface
that is used to take a python package source tree and spit out a wheel.pip
to know how to install the build
backend specified by PEP 517 when pip is building packages. Specifically, it
describes how to create an isolated python environment with just the needed
requirements to build the package (that is, the packages to install the
build backend, not the package’s dependencies).Both PEPs 517 and 518 use a new file called pyproject.toml
to describe their
settings:
[build-system]
# Defined by PEP 518, what the build environment requires:
requires = ["poetry>=0.12"]
# Defined by PEP 517, how to kick off the build:
build-backend = "poetry.masonry.api"
Both poetry and flit work with pyproject.toml
via its support for
namespacing tool-specific settings. An example using poetry:
[tool.poetry]
name = "my-package"
version = "0.1.0"
description = "The description of the package"
[tool.poetry.dependencies]
python = "^3.7"
flask-hookserver = "==1.1.0"
requests = "==2.22.0"
While both PEPs 517 and 518 were started a while ago, it’s only from pip 19.1 (early 2019) that pip started supporting the use of build backends specified via PEP 517.
pip
enters “PEP 517 mode” when pip wheel
is called if pip
finds a
pyproject.toml
file in the package it is building. When in this mode,
pip
acts as a build frontend, a term defined by PEP 517 for the application
that is used from the command line and is making calls into a build backend,
such as poetry. As a build frontend, the job for pip
here is to:
requires = ["poetry>=0.12"]
).build-backend = "poetry.masonry.api"
) within the created
isolated environment.The build backend then must create a wheel from the source folder or source
distribution and put it in the place that pip
tells it to.
For me, this seems like big news for projects like poetry that do a lot from
scratch and end up with laundry lists of feature requirements to enable them to
be integrated into full development and CI pipelines. If they can instead be
ingrated into CI via existing tools like pip
, then they are much easier to
adopt in development for their useful features there, such as poetry’s virtual
environment management features. In particular, both flit and poetry will use
the information defined in their respective sections of pyproject.toml
to
build the wheel and requirement wheels just as they would on a developer’s
machine (to an extent anyway, my experiments indicate poetry ignores its .lock
file when resolving requirements).
In this way, PEPs 517 and 518 close the loop in allowing tools like poetry to concentrate on what they want to concentrate on, rather than needing to build out a whole set of functions before they can be accepted into developers’ toolboxes.
An example Dockerfile
shows this in action, for building the myapp
package
into a wheel along with its dependencies, and then copying the app and
dependency wheels into the production image and installing them:
# Stage 1 build to allow pulling from private repos requiring creds
FROM python:3.8.0-buster AS builder
RUN mkdir -p /build/dist /build/myapp
# pyproject.toml has deps for the `myapp` package
COPY pyproject.toml /build
# Our project source code
COPY myapp/*.py /build/myapp/
# This line installs and uses the build backend defined in
# pyproject.toml to build the application wheels from the source
# code we copy in, outputting the app and dependency wheels
# to /build/dist.
RUN pip wheel -w /build/dist /build
# Stage 2 build: copy and install wheels from stage 1 (`builder`).
FROM python:3.8.0-slim-buster as production-image
COPY --from=builder [ "/build/dist/*.whl", "/install/" ]
RUN pip install --no-index /install/*.whl \
&& rm -rf /install
CMD [ "my-package-script" ]
And this is what I now understand about the state of python packaging as we enter 2020. The future looks bright.
It’s relatively easy to find articles online about the basics of Kubernetes that talk about how Kubernetes looks on your servers. That a Kubernetes cluster consists of master nodes (where Kubernetes book-keeping takes place) and worker nodes (where your applications and some system applications run). And that to run more stuff, you provision more workers, and that each pod looks like its own machine. And so on.
But for me, I found a disconnect between that mental image of relatively clean looking things running on servers and the reams and reams of YAML one must write to seemingly do anything with Kubernetes. Recently, I found the Kubernetes API overview pages. Somehow I’d not really internalised before that the reams of YAML are just compositions of types, like programming in any class-based language.
But they are, because in the end all the YAML you pass into kubectl
is just
getting kubectl
to work with a data model inside the Kubernetes master node
somewhere. The types described in the Kubernetes API documentation are the
building blocks of that data model, and learning them unlocked a new level of
understanding Kuberentes for me.
The data model is built using object composition, and I found a nice way to discover it was to start from a single container object and build out to a running deployment, using the API documentation as much as I could but returning to the prose documentation for examples when I got stuck or, as we’ll see with ConfigMaps, when the API documentation just can’t describe everything you need to know.
This is our starting point. While the smallest thing that Kubernetes will
schedule on a worker is a Pod
, the basic entity is the Container
, which
encapsulates (usually) a single process running on a machine. Looking at the
API definition, we can easily see what the allowed values are –
for me this was the point where what had previously been seemingly arbitrary
YAML fields started to slot together into a type system! Just like other API
documentation, suddenly there’s a place where I can see what goes in the YAML
rather than copy-pasting things from the Kubernetes prose documentation,
tweaking it and then just having to π€.
Let’s take a quick look at some fields:
Container
is, of course, the image
that it
will run. From the Container API documentation, we can look
through the table of fields within the Container
and see that a string
is required for this field.name
is also required.imagePullPolicy
.
If we look at imagePullPolicy
, we can see that it’s also a string
but
also the documentation states what the acceptable values are: Always
,
Never
and IfNotPresent
. If YAML allowed enums
, I’m sure this would be
an enum
. Anyway, we can immediately see what the allowed values are –
this is much easier than trying to find this within the prose documentation!volumeMounts
, which is a little more
complicated: it’s a field of a new type rather than a primitive value. The
new type is VolumeMount
and the documentation tells us that this is an
array of VolumeMount
objects and links us to the appropriate API
docs for VolumeMount
objects. This was the real moment when
I stopped having to use copy-paste and instead was really able to start
constructing my YAML – πͺ!The documentation is also super-helpful in telling us where we can put things.
Right at the top of the Container
API spec, it tells us:
Containers are only ever created within the context of a Pod. This is usually done using a Controller. See Controllers: Deployment, Job, or StatefulSet.
Totally awesome, we now know that we need to put the Container
within
something else for it to be useful!
So let’s make ourselves a minimal container:
name: haproxy
image: haproxy:2.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
name: HAProxyConfigVolume # References a containing PodSpec
mountPath: /usr/local/etc/haproxy/
readOnly: true
We can build all this from the API documentation – and it’s easy to avoid the unneeded settings that often come along with copy-pasted examples from random websites on the internet. By reading the documentation for each field, we can also get a much better feel for how this container will behave, making it easier to debug problems later.
So now we have our Container
we need to make a Pod
so that Kubernetes can
schedule HAProxy onto our nodes. From the Container
docs, we
have a link direct to the PodSpec
documentation. Awesome, we can
follow that up to our next building block.
A PodSpec
has way more fields than a Container
! But we can see that the
first one we need to look at is containers
which we’re told is an array of
Container
objects. And hey we have a Container
object already, so let’s
start our PodSpec
with that:
containers:
- name: haproxy
image: haproxy:2.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
name: HAProxyConfigVolume # References a containing PodSpec
mountPath: /usr/local/etc/haproxy/
readOnly: true
Now, we also have that VolumeMount
object in our HAProxy container that’s
expecting a Volume
from the PodSpec
. So let’s add that. The Volume
API
spec should help and from the PodSpec
docs we can see that a PodSpec
has a volumes
field which should have an array of Volume
objects.
Looking at the Volume
spec, we can see that it’s mostly a huge list
of the different types of volumes that we can use. Each of which links off to
yet another type which describes that particular volume. One thing to note is
that the name
of the Volume
object we create needs to match the name
of the VolumeMount
in the Container
object. Kubenetes has a lot of implied
coupling like that, it’s just something to get used to.
We’ll use a configMap
volume (ConfigMapVolumeSource
docs) to
mount a HAProxy config. We assume that the ConfigMap
contains whatever
files that HAProxy needs. Here’s the PodSpec
with the volumes
field:
containers:
- name: haproxy
image: haproxy:2.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
mountPath: /usr/local/etc/haproxy/
name: HAProxyConfigVolume # This name comes from the PodSpec
readOnly: true
volumes:
- name: HAProxyConfigVolume
configMap:
name: HAProxyConfigMap # References a ConfigMap in the cluster
So now what we have is a PodSpec
object which is composed from an array of
Container
objects and and array of Volume
objects. To Kubernetes, our
PodSpec
object is a “template” for making Pods out of — we further need to
embed this object inside another object which describes how we want to use this
template to deploy one or more Pods to our Kubernetes cluster.
There are several ways to get our PodSpec
template actually made into a
running process on the Kubernetes cluster. The ones mentioned all the way back
in the Container
docs are the most common:
Deployment
: run a given number of Pod
resources, with upgrade semantics
and other useful things.Job
and CronJob
: run a one-time or periodic job that uses the Pod
as
its executable task.StatefulSet
: a special-case thing where Pods get stable identities.Deployment
resources are most common, so we’ll build one of those. As always,
we’ll look to the Deployment
API spec to help. An interesting
thing to note about Deployment
resources is that the docs have a new set of
options in the sidebar underneath the Deployment heading – links to the API
calls in the Kubernetes API that we can use to manage our Deployment
objects.
Suddenly we’ve found that Kubernetes has a HTTP API we can use rather than
kubectl
if we want — time for our π€ overlords to take over!
Anyway, for now let’s keep looking at the API spec for what our Deployments
need to look like; whether we choose to pass them to either kubectl
or these
new shiny API endpoints we just found out about.
Deployment
resources are top-level things, meaning that we can create, delete
and modify them using the the Kubernetes API — up until now we’ve been
working with definitions that need to be composed into higher level types to be
useful. Top level types all have some standard fields:
apiVersion
: this allows us to tell Kubernetes what version of the API
we are using to manage this Deployment
resource; as in any API, different
API versions have different fields and behaviours.kind
: this specifies the kind of the resource, in this case Deployment
.metadata
: this field contains lots of standard Kubernetes metadata, and
it has a type of its own, ObjectMeta
. The key thing we
need here is the name
field, which is a string
.Specific to a deployment we have just one field to look at:
spec
: this describes how the Deployment
will operate (e.g., how upgrades
will be handled) and the Pod
objects it will manage.If we click kubectl example
in the API spec, the API docs show
a basic Deployment. From this, we can see the values we need to use for
apiVersion
, kind
and metadata
to get us started. A first version of our
Deployment
looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: haproxy-load-balancer
spec:
# TODO
Next we’ll need to look at the DeploymentSpec
API docs to
see what we need to put into there. From experience, the most common fields
here are:
template
: a PodTemplateSpec
which contains a standard metadata
field
containing ObjectMeta
(the same type as at the top-level of the
Deployment
!) and a spec
field where we finally find place to put
the PodSpec
we made earlier. This field is vital, as without it the
Deployment has nothing to run!selector
: this field works with the metadata
in the template
field
to tell the Deployment’s controller (the code within Kubernetes that
manages Deployment
resources) which Pods are related to this Deployment
.
Typically it references labels within the PodTemplateSpec
’s metadata
field. The selector documentation talks more about how
selectors work; they are used widely within Kubernetes.replicas
: optional, but almost all Deployments have this field; how many
Pods should exist that match the selector
at all times. 3
is a common
value as it works well for rolling reboots during upgrades.We can add a basic DeploymentSpec
with three replicas that uses the app
label to tell the Deployment
what Pods it is managing:
apiVersion: apps/v1
kind: Deployment
metadata:
name: haproxy-load-balancer
spec:
replicas: 3
selector:
matchLabels:
app: haproxy
template:
metadata:
labels:
app: haproxy
spec:
# PodSpec goes here
Finally, here is the complete Deployment
built from scratch using the API
documentation. While I think it would be pretty impossible to get here from the
API documentation alone, once one has a basic grasp of concepts like “I need a
Deployment
to get some Pods running”, reading the API docs alongside
copy-pasting YAML into kubectl
is most likely a really fast way of getting up
to speed; I certainly wish I’d dived in to the API docs a few months before I
did!
apiVersion: apps/v1
kind: Deployment
metadata:
name: haproxy-load-balancer
spec:
replicas: 3
selector:
matchLabels:
app: haproxy
template:
metadata:
labels:
app: haproxy
spec:
containers:
- name: haproxy
image: haproxy:2.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
mountPath: /usr/local/etc/haproxy/
name: HAProxyConfigVolume
readOnly: true
volumes:
- name: HAProxyConfigVolume
configMap:
name: HAProxyConfigMap
For completeness, let’s get a trivial HAProxy configuration and put it inside
a ConfigMap
resource so this demonstration is runnable. The API documentation
for ConfigMap is less helpful than we’ve seen so far, frankly.
We can see ConfigMap
objects can be worked with directly via the API, as they
have the standard apiVersion
, kind
and metadata
fields we saw on
Deployment
objects.
HAProxy configuration is a text file, so we can see that it probably goes in
the data
field rather than the binaryData
field, as data
can hold any
UTF-8 sequence. We can see that data
is an object, but further than that
there isn’t detail about what should be in that object.
In the end, we need to go and check out the prose documentation
on how to use a ConfigMap
to understand what to do. Essentially what we find
is that the keys used in the data
object are used in different ways based on
how we are using the ConfigMap
. If we choose to mount the ConfigMap
into a
container — as we do in the PodSpec
above — then the keys of the data
object become filenames within the mounted filesystem. If,
instead, we set up the ConfigMap
to be used via environment
variables, the keys would become the variable names. So we need to
know this extra information before we can figure what to put in that data
field.
The API documentation often requires reading alongside the prose documentation in this manner as many Kubernetes primitives have this use-dependent aspect to them.
So in this case, we add a haproxy.cfg
key to the data
object, as the HAProxy
image we are using by default will look to /usr/local/etc/haproxy/haproxy.cfg
for its configuration.
apiVersion: v1
kind: ConfigMap
metadata:
name: HAProxyConfigMap # Match name in VolumeMount
data:
haproxy.cfg: |
defaults
mode http
frontend normal
bind *:80
default_backend normal
backend normal
server app webapp:8081 #Β Assumes webapp Service
Recall from Just enough YAML that starting an object value with
a |
character makes all indented text that comes below into a single string,
so this ConfigMap ends up with a file containing the HAProxy configuration
correctly.
So we now have a simple HAProxy deployment in Kubernetes which we’ve mostly
been able to build from reading the API documentation rather than blindly
copy-pasting YAML from the internet. We — at least I — better understand
what’s going on with all the bits of YAML and it’s starting to feel much less
arbitrary. I feel now like I might actually stand a chance of writing some
code that calls the Kubernetes API rather than relying on YAML and kubectl
.
And what’s that code called? An operator! I’d heard the name bandied about a
lot, but had presumed some black magic was involved — but nope, it’s just
about calls that manipulate objects within the Kubernetes API using the
types we’ve talked about above, along with about a zillion other ones, including
ones you make up yourself! Obviously you need to figure out how best to manage
the objects, but when all is said and done that’s what you are doing.
Anyway, hopefully this has de-mystified some more of Kubernetes for you, dear reader; as I mentioned understanding these pieces helped me go from a copy-paste-hope workflow towards a much less frustrating experience building up my Kubernetes resources.
When we talk about Kubernetes, we should really be talking about the fact that
when you, as an administrator, interact with Kubernetes using kubectl
, you
are using kubectl
to manipulate the state of data within Kubernetes via
Kubernetes’s API.
But when you use kubectl
, the way you tend to tell kubectl
what to do with
the Kubernetes API is using YAML. A lot of freakin’ YAML. So while I hope to
write more about the actual Kubernetes API sometime soon, first we’ll have talk
a bit about YAML; just enough to get going. Being frank, I don’t get on well
with YAML. I do get on with JSON, because in JSON there is a single way to write
anything. While you don’t even get to choose between double and single quotes
for your strings in JSON, I overheard a colleague say that there are over sixty
ways to write a string in YAML. Sixty ways to write a string! I think they
were being serious.
Thankfully, idiosyncratic Kubernetes YAML doesn’t do much with the silly end of YAML, and even sticks to just three ways to represent strings πͺ.
While not required for Kubernetes, while writing this I found some even more strange corners of YAML than I’d come across before. I thought I’d note these down for amusement’s sake even though I think they just come from over-applying the grammar rather than anyone seriously believing that they are sensible.
Below I’ve included YAML and the JSON equivalents, simply because I find JSON a conveniently unambiguous representation, and one that I expect to be familiar to most readers (including myself).
In JSON, you write an object like this:
"object": {
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1,
"anotherobject": {
"hello": "world"
}
}
Unlike JSON, in YAML the spacing makes a difference. We write a object like this:
object:
key: value
boolean: true
null_value:
integer: 1
anotherobject:
hello: world
If you bugger up the indenting, you’ll get a different value. So this YAML:
object2:
key: value
boolean: true
null_value:
integer: 1
Means this JSON:
"object2": null,
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
When combined with a defacto standard of two-space indenting, I find YAML objects pretty hard to read. Particularly in a long sequence of objects, it’s very easy to miss where one object stops and another begins. It’s also easy to paste something with slightly wrong indent, changing its semantics, in a way that just isn’t possible in JSON.
You can actually just write an object with braces and everything in YAML, just like JSON. In fact JSON is a subset of YAML so any JSON document is also a YAML document. When I learned this it was essentially π€― combined with π. However, no-one ever writes JSON into YAML documents, so in the end this fact is purely academic.
Well apart from sometimes you see JSON arrays.
Arrays look like (un-numbered) lists:
array1:
- mike
- fred
You can indent all the list items how you like, so this is the same:
array1:
- mike
- fred
Both translate to:
"array1": ["mike", "fred"]
But it’s easy to make a mistake. This YAML with its accidental indent:
array1:
- mike
- fred
- john
Means this:
"array1": [
"mike - fred",
"john"
]
Which I find a bit too silently weird for my tastes.
The main thing I get wrong here is when writing arrays of objects. It’s very
easy to misplace a -
.
So this is a list of two objects:
array:
- key: value
boolean: true
null_value:
integer: 1
- foo: bar
hello: world
Which becomes:
"array": [
{
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
},
{
"foo": "bar",
"hello": "world"
}
]
But I find it very easy to miss the -
, particularly in lists of objects with
sub-objects. In addition, YAML’s permissiveness enables one to mistype
syntactically valid but semantically different constructs, like here where we
want to create an object but end up with an extra list item:
array:
- object:
- foo: bar
hello: world
baz: world
- key: value
boolean: true
null_value:
integer: 1
Which gives the JSON:
"array": [
{
"object": null
},
{
"foo": "bar",
"hello": "world",
"baz": "world"
},
{
"key": "value",
"boolean": true,
"null_value": null,
"integer": 1
}
]
Particularly when reviewing complex structures, it’s easy to start to lose the
thread of which -
and which indent belongs to which object.
I find this perhaps the best example of where YAML goes off the rails. It’s easy and (I find) clear to represent arrays of arrays in JSON:
[
1,
[1,2],
[3, [4]],
5
]
This is… pretty wild by default in YAML:
- 1
- - 1
- 2
- - 3
- - 4
- 5
I suspect this is reducing to the absurd for effect, however, perhaps the best thing here is to regress to inline JSON.
Anyway, let’s get back to those sixty ways to represent strings. The three ways you’ll commonly see used in Kubernetes manifest YAML files are as follows:
array:
- "mike"
- mike
- |
mike
These all mean the same thing:
"array": [
"mike",
"mike",
"mike"
]
The first form appears to actually always be a string. The second form is always
a string – unless it’s a reserved word. The third form allows you to insert
multiline strings as long as you indent appropriately. This third form is most
seen in ConfigMap
and Secret
objects as it is very convenient for multi-line
text files.
array:
- true
- "mike"
- |
mike
fred
john
"array": [
true,
"mike",
"mike\nfred\njohn\n"
],
Thankfully I’ve not seen them in Kubernetes YAML, but YAML contains at least two
further forms that look remarkably similar to the |
form. The first, which
uses >
to start it, only inserts newlines for two carriage returns, and for
some reason (almost) always appears to insert a newline at the end. The second
misses out a control character at the start of the string but looks identical in
passing. In this variant the newlines embedded in the YAML disappear in the
actual string.
In this example, I include the |
form, the >
and the prefix-less form using
the same words and newline patterns to show how similar-looking YAML gives
different strings:
array:
- |
mike
fred
john
- >
mike
fred
john
- mike
fred
john
Giving the JSON:
"array": [
"mike\nfred\njohn\n",
"mike fred john\n",
"mike fred john"
],
I find the YAML definitely looks cleaner, but the JSON is better at spelling out what it means.
While experimenting, I find an odd edge case with the >
prefix. Where I used
it at the end of a file, the trailing \n
ended up being dropped:
names: >
mike
fred
john
names2: >
mike
fred
john
Ends up with the \n
going missing in names2
:
"names": "mike fred john\n",
"names2": "mike fred john"
Just π€·ββοΈ and move on.
Finally, you will often see ---
in Kubernetes YAML files. All this means is
that what follows the ---
is the start of a new YAML object; it’s a way
of putting multiple YAML objects inside one file. This is actually pretty
nice, although again it’s pretty minimal and easy to miss when scanning a file.
And that’s about enough YAML to understand Kubernetes manifests π.
I’ve been using a pair of AirPods Pro for just under a week now. I use headphones in three main environments, and up until now have used three separate pairs, each of which works best for that environment. As they combine true-wireless comfort, noise-cancelling, a high promise transparency mode and closed-backs, I wondered whether the AirPods Pro could possibly replace at least a couple of my existing sets. Here we go.
Commute. My go-to headphones for my commute were a pair of first-gen AirPods that I’ve had nearly three years. I walk my commute, so I like to be able to hear what’s going on around me on the street; the open backed AirPods work great for this. This is obviously a place where transparency mode comes into play. However, both the Sony and Bose pairs mentioned below have transparency modes that, well, just don’t feel transparent. They make it feel like the outside world is coming through water. The AirPods Pro, however, while they do seem to have minor trouble with siblants in spoken-word, feel much closer to super-imposing your audio on the surroundings than any other transparency mode I’ve used. It’s suprisingly close to the experience using the original AirPods. On top of this, you can obviously turn to noise-cancelling on busy streets rather than turning up the volume. These two combined are a game-changer; right now I’m not tempted to swap back.
In the office. The original AirPods are essentially useless in the office for blocking out chatter. So I’ve been using a pair of WI-1000X for a couple of years, which block out background chatter really well, especially when used with the foam tips they come with. However, here too the AirPods Pro still work okay even without foam tips, and the lack of neckband and wires are just as noticable an improvement as with my walk into the office. In addition, the AirPods Pro charging case is just easier to use than the somewhat fiddly charger of the WI-1000X. At the moment, I’m grabbing for the AirPods Pro in the office. They block out enough chatter and true-wireless is just way more comfortable.
Flying. For drowning out engine noise on flights, I have found the (wired) Bose Qc20 beat the WI-1000X (the reverse is true for office chatter, strangely). The noise-cancelling is better on the Bose pair, and they fit into a very small carrying pouch compared to the neckband-saddled WI-1000X; much easier to chuck into a bag. I would say the AirPods Pro have about the same noise cancelling effectiveness as the Sony headphones. I’ve yet to fly, so time will tell whether the convenience of the wireless headphones beats out the (likely) better noise cancelling of the Bose pair. I’ll certainly be taking both to try them out as I feel it’ll be a close call.
Overall I’ve been surprised by how close the AirPods Pro have come to replacing the three pairs I used previously. Time will tell how I end up settling long term, but Apple have hit a good balance with these headphones. I suspect the convenience of the true wireless, good-enough noise-cancelling and compact size may make these my go-to headphones most of the time. Oh, and they sound good enough too – but you’d expect that for the price.