Python Packaging in 2020
For a long time, I’ve kind of existed with a barely-there understanding of
Python packaging. Just enough to copy a
requirements.txt file from an old
project and write a
pip install -r requirements.txt. A few
years ago, I started using pipenv, and again learned just-enough to make it
Over the past year, I became frustrated with this situation:
pipenvbecame increasingly hard to make work through upgrades and its kitchen-sink approach.
- I started building docker images for Python applications, and understanding packaging in more detail became essential to build secure and performant images.
Last year (2019), I started to look at tools like poetry, which essentially start the whole process from scratch, including new dependency resolution and package-building code. When figuring out how to use these in Dockerfiles, I realised I needed to understand a bunch more about both packaging and virtual environments. The good news was this actually progressed a lot in the 2018-9 time frame. The bad news was that meant there was a lot to learn, and a bunch of stuff was out of date.
In the beginning, there was the source distribution
Until 2013, when PEP 427 defined the
whl archive format for Python packages,
whenever a package was installed via
pip install it was always built from
source via a distribution format called
sdist. For pure-python files this
wasn’t typically much of a problem, but for any packages making use of
C extensions it meant that the machine where
pip install was run needed
a compiler toolchain, python development headers and so on.
This situation is more than a little painful. As PEP 427’s rationale states:
Python’s sdist packages are defined by and require the distutils and setuptools build systems, running arbitrary code to build-and-install, and re-compile, code just so it can be installed into a new virtualenv.
After PEP 427, packages could also be distributed as so-called binary packages or wheels.
At the time I started to see python binary packages, because I had never looked in depth into python packaging, I was confused and even somewhat alarmed by the term binary package, particularly as I was quite used to source distributions by 2013. But in general they are a big win:
- For pure python packages the term is a slight misnomer as the wheel format is
just about how the files are laid out inside an archive – typically these
wheels will have a single
.whlper python version they support, that will be named like
anyspecify the python ABI version (for C extensions) and the target platform respectively. As pure python packages have no C extensions, they have no target ABI and platform, but will often have a python version requirement, though this example supports both python 2 and 3.
- The tags, such as
none, in filenames are defined in PEP 425.
- The tags, such as
- For packages including C extensions which are linked to the Python C runtime
during compilation, the name does make sense because the build process
pre-compiles the extension into a binary, unlike in the sdist world where C
extensions were compiled during package installation. This results in
.whlfiles, as a separate
.whlfiles must be created for each target system and python version. For example,
cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whlis a package with binaries built against C Python 3.4, ABI level 3 for a Linux machine and processor architecture.
In the end, wheels provide a much simpler and more reliable install experience as every user is not forced to compile packages themselves, with all the tooling and security concerns inherent in that approach.
Stepping back to how wheels are built
Wheels soon started taking over the python packaging ecosystem, though there are still hold-outs even today that ship source packages rather than binary packages (often for good reasons).
However, all python packages were still defined via
setup.py, an opaque
standard that was defined purely by the
code. While there was now a binary standard for built packages, in practice
there was only one way of building them.
pip for example hardcoded the calls
setup.py into its
pip wheel command, so using other build systems was
very difficult, making implementation of them somewhat thankless tasks. Before
poetry, it doesn’t look like anyone much attempted it.
distutils module was shipped with Python, so it was natural that it came
to be the defacto standard, and including a packaging tool was a good decision
from the python maintainers.
distutils wasn’t that easy to use on its own,
setuptools was built as a package to improve that. Over time,
setuptools also grew to be somewhat gnarly itself.
Tools like flit (started 2015) were then created to tame the new complexity,
setuptools in another layer – though flit is
opinionated. Flit’s way of doing things became popular, but in the end it was
still using distutils and setuptools under the hood (per this flit source
code). Even so, flit became pretty popular as its workflow is simple and
understandable. Indeed, generation of the files used by
behind the scenes so far as I can tell (I didn’t actually try flit out, so
may have made some errors here).
Poetry and PEPs 517 & 518
In 2018 development of poetry started, at least per the earliest commits from
the github repository. Poetry is an
ambitious rebuild of python packaging pretty much from scratch. It’s able to
resolve dependencies and build wheels without any use of distutils and
setuptools. The main problem with poetry is that it needs to re-implement a lot
of existing functionality that is already present in other tools like
be accepted into development and CI pipelines.
At a similar time, the python community came up with PEP 517 and 518.
- PEP 517 (status Provisional, 2015-2018) is about a standard way to specify
alternative build backends that
pipcan use when building wheels – for example, using Poetry or flit’s build engine rather than going directly to
distutils. A build backend is a Python module with a standard interface that is used to take a python package source tree and spit out a wheel.
- PEP 518 (status Provisional, 2016) works in tandem with PEP 517 and
specifies a way for a tool like
pipto know how to install the build backend specified by PEP 517 when pip is building packages. Specifically, it describes how to create an isolated python environment with just the needed requirements to build the package (that is, the packages to install the build backend, not the package’s dependencies).
Both PEPs 517 and 518 use a new file called
pyproject.toml to describe their
[build-system] # Defined by PEP 518, what the build environment requires: requires = ["poetry>=0.12"] # Defined by PEP 517, how to kick off the build: build-backend = "poetry.masonry.api"
[tool.poetry] name = "my-package" version = "0.1.0" description = "The description of the package" [tool.poetry.dependencies] python = "^3.7" flask-hookserver = "==1.1.0" requests = "==2.22.0"
While both PEPs 517 and 518 were started a while ago, it’s only from pip 19.1 (early 2019) that pip started supporting the use of build backends specified via PEP 517.
pip enters “PEP 517 mode” when
pip wheel is called if
pip finds a
pyproject.toml file in the package it is building. When in this mode,
pip acts as a build frontend, a term defined by PEP 517 for the application
that is used from the command line and is making calls into a build backend,
such as poetry. As a build frontend, the job for
pip here is to:
- Create an isolated python environment.
- Install the build backend into this environment via the
PEP 518 requirements (
requires = ["poetry>=0.12"]).
- Get the package ready for building in this environment.
- Invoke the build backend, for example poetry, using the entrypoint defined
by PEP 517 (
build-backend = "poetry.masonry.api") within the created isolated environment.
The build backend then must create a wheel from the source folder or source
distribution and put it in the place that
pip tells it to.
For me, this seems like big news for projects like poetry that do a lot from
scratch and end up with laundry lists of feature requirements to enable them to
be integrated into full development and CI pipelines. If they can instead be
ingrated into CI via existing tools like
pip, then they are much easier to
adopt in development for their useful features there, such as poetry’s virtual
environment management features. In particular, both flit and poetry will use
the information defined in their respective sections of
build the wheel and requirement wheels just as they would on a developer’s
machine (to an extent anyway, my experiments indicate poetry ignores its
file when resolving requirements).
In this way, PEPs 517 and 518 close the loop in allowing tools like poetry to concentrate on what they want to concentrate on, rather than needing to build out a whole set of functions before they can be accepted into developers' toolboxes.
Dockerfile shows this in action, for building the
into a wheel along with its dependencies, and then copying the app and
dependency wheels into the production image and installing them:
# Stage 1 build to allow pulling from private repos requiring creds FROM python:3.8.0-buster AS builder RUN mkdir -p /build/dist /build/myapp # pyproject.toml has deps for the `myapp` package COPY pyproject.toml /build # Our project source code COPY myapp/*.py /build/myapp/ # This line installs and uses the build backend defined in # pyproject.toml to build the application wheels from the source # code we copy in, outputting the app and dependency wheels # to /build/dist. RUN pip wheel -w /build/dist /build # Stage 2 build: copy and install wheels from stage 1 (`builder`). FROM python:3.8.0-slim-buster as production-image COPY --from=builder [ "/build/dist/*.whl", "/install/" ] RUN pip install --no-index /install/*.whl \ && rm -rf /install CMD [ "my-package-script" ]
And this is what I now understand about the state of python packaging as we enter 2020. The future looks bright.