Python Packaging in 2020
For a long time, I’ve kind of existed with a barely-there understanding of
Python packaging. Just enough to copy a requirements.txt
file from an old
project and write a Makefile
with pip install -r requirements.txt
. A few
years ago, I started using pipenv, and again learned just-enough to make it
work.
Over the past year, I became frustrated with this situation:
pipenv
became increasingly hard to make work through upgrades and its kitchen-sink approach.- I started building docker images for Python applications, and understanding packaging in more detail became essential to build secure and performant images.
Last year (2019), I started to look at tools like poetry, which essentially start the whole process from scratch, including new dependency resolution and package-building code. When figuring out how to use these in Dockerfiles, I realised I needed to understand a bunch more about both packaging and virtual environments. The good news was this actually progressed a lot in the 2018-9 time frame. The bad news was that meant there was a lot to learn, and a bunch of stuff was out of date.
In the beginning, there was the source distribution
Until 2013, when PEP 427 defined the whl
archive format for Python packages,
whenever a package was installed via pip install
it was always built from
source via a distribution format called sdist
. For pure-python files this
wasn’t typically much of a problem, but for any packages making use of
C extensions it meant that the machine where pip install
was run needed
a compiler toolchain, python development headers and so on.
This situation is more than a little painful. As PEP 427’s rationale states:
Python’s sdist packages are defined by and require the distutils and setuptools build systems, running arbitrary code to build-and-install, and re-compile, code just so it can be installed into a new virtualenv.
After PEP 427, packages could also be distributed as so-called binary packages or wheels.
At the time I started to see python binary packages, because I had never looked in depth into python packaging, I was confused and even somewhat alarmed by the term binary package, particularly as I was quite used to source distributions by 2013. But in general they are a big win:
- For pure python packages the term is a slight misnomer as the wheel format is
just about how the files are laid out inside an archive – typically these
wheels will have a single
.whl
per python version they support, that will be named likeFlask-1.1.1-py2.py3-none-any.whl
, wherenone
andany
specify the python ABI version (for C extensions) and the target platform respectively. As pure python packages have no C extensions, they have no target ABI and platform, but will often have a python version requirement, though this example supports both python 2 and 3.- The tags, such as
none
, in filenames are defined in PEP 425.
- The tags, such as
- For packages including C extensions which are linked to the Python C runtime
during compilation, the name does make sense because the build process
pre-compiles the extension into a binary, unlike in the sdist world where C
extensions were compiled during package installation. This results in
several different
.whl
files, as a separate.whl
files must be created for each target system and python version. For example,cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whl
is a package with binaries built against C Python 3.4, ABI level 3 for a Linux machine and processor architecture.
In the end, wheels provide a much simpler and more reliable install experience as every user is not forced to compile packages themselves, with all the tooling and security concerns inherent in that approach.
Stepping back to how wheels are built
Wheels soon started taking over the python packaging ecosystem, though there are still hold-outs even today that ship source packages rather than binary packages (often for good reasons).
However, all python packages were still defined via setup.py
, an opaque
standard that was defined purely by the distutils
and setuptools
source
code. While there was now a binary standard for built packages, in practice
there was only one way of building them. pip
for example hardcoded the calls
to setup.py
into its pip wheel
command, so using other build systems was
very difficult, making implementation of them somewhat thankless tasks. Before
poetry, it doesn’t look like anyone much attempted it.
The distutils
module was shipped with Python, so it was natural that it came
to be the defacto standard, and including a packaging tool was a good decision
from the python maintainers. distutils
wasn’t that easy to use on its own,
however, so setuptools
was built as a package to improve that. Over time,
setuptools
also grew to be somewhat gnarly itself.
Tools like flit (started 2015) were then created to tame the new complexity,
and wrap distutils
and setuptools
in another layer – though flit is
opinionated. Flit’s way of doing things became popular, but in the end it was
still using distutils and setuptools under the hood (per this flit source
code). Even so, flit became pretty popular as its workflow is simple and
understandable. Indeed, generation of the files used by distutils
happens
behind the scenes so far as I can tell (I didn’t actually try flit out, so
may have made some errors here).
Poetry and PEPs 517 & 518
In 2018 development of poetry started, at least per the earliest commits from
the github repository. Poetry is an
ambitious rebuild of python packaging pretty much from scratch. It’s able to
resolve dependencies and build wheels without any use of distutils and
setuptools. The main problem with poetry is that it needs to re-implement a lot
of existing functionality that is already present in other tools like pip
to
be accepted into development and CI pipelines.
At a similar time, the python community came up with PEP 517 and 518.
- PEP 517 (status Provisional, 2015-2018) is about a standard way to specify
alternative build backends that
pip
can use when building wheels – for example, using Poetry or flit’s build engine rather than going directly todistutils
. A build backend is a Python module with a standard interface that is used to take a python package source tree and spit out a wheel. - PEP 518 (status Provisional, 2016) works in tandem with PEP 517 and
specifies a way for a tool like
pip
to know how to install the build backend specified by PEP 517 when pip is building packages. Specifically, it describes how to create an isolated python environment with just the needed requirements to build the package (that is, the packages to install the build backend, not the package’s dependencies).
Both PEPs 517 and 518 use a new file called pyproject.toml
to describe their
settings:
[build-system]
# Defined by PEP 518, what the build environment requires:
requires = ["poetry>=0.12"]
# Defined by PEP 517, how to kick off the build:
build-backend = "poetry.masonry.api"
Both poetry and flit work with pyproject.toml
via its support for
namespacing tool-specific settings. An example using poetry:
[tool.poetry]
name = "my-package"
version = "0.1.0"
description = "The description of the package"
[tool.poetry.dependencies]
python = "^3.7"
flask-hookserver = "==1.1.0"
requests = "==2.22.0"
While both PEPs 517 and 518 were started a while ago, it’s only from pip 19.1 (early 2019) that pip started supporting the use of build backends specified via PEP 517.
pip
enters “PEP 517 mode” when pip wheel
is called if pip
finds a
pyproject.toml
file in the package it is building. When in this mode,
pip
acts as a build frontend, a term defined by PEP 517 for the application
that is used from the command line and is making calls into a build backend,
such as poetry. As a build frontend, the job for pip
here is to:
- Create an isolated python environment.
- Install the build backend into this environment via the
PEP 518 requirements (
requires = ["poetry>=0.12"]
). - Get the package ready for building in this environment.
- Invoke the build backend, for example poetry, using the entrypoint defined
by PEP 517 (
build-backend = "poetry.masonry.api"
) within the created isolated environment.
The build backend then must create a wheel from the source folder or source
distribution and put it in the place that pip
tells it to.
For me, this seems like big news for projects like poetry that do a lot from
scratch and end up with laundry lists of feature requirements to enable them to
be integrated into full development and CI pipelines. If they can instead be
ingrated into CI via existing tools like pip
, then they are much easier to
adopt in development for their useful features there, such as poetry’s virtual
environment management features. In particular, both flit and poetry will use
the information defined in their respective sections of pyproject.toml
to
build the wheel and requirement wheels just as they would on a developer’s
machine (to an extent anyway, my experiments indicate poetry ignores its .lock
file when resolving requirements).
In this way, PEPs 517 and 518 close the loop in allowing tools like poetry to concentrate on what they want to concentrate on, rather than needing to build out a whole set of functions before they can be accepted into developers' toolboxes.
An example Dockerfile
shows this in action, for building the myapp
package
into a wheel along with its dependencies, and then copying the app and
dependency wheels into the production image and installing them:
# Stage 1 build to allow pulling from private repos requiring creds
FROM python:3.8.0-buster AS builder
RUN mkdir -p /build/dist /build/myapp
# pyproject.toml has deps for the `myapp` package
COPY pyproject.toml /build
# Our project source code
COPY myapp/*.py /build/myapp/
# This line installs and uses the build backend defined in
# pyproject.toml to build the application wheels from the source
# code we copy in, outputting the app and dependency wheels
# to /build/dist.
RUN pip wheel -w /build/dist /build
# Stage 2 build: copy and install wheels from stage 1 (`builder`).
FROM python:3.8.0-slim-buster as production-image
COPY --from=builder [ "/build/dist/*.whl", "/install/" ]
RUN pip install --no-index /install/*.whl \
&& rm -rf /install
CMD [ "my-package-script" ]
And this is what I now understand about the state of python packaging as we enter 2020. The future looks bright.