Deploying python packages to production servers is still, unfortunately, not a fixed problem. Sean Sabbage, Head of Software Engineering, details how we do it and some of the reasons behind that.
Over the past month or so I have spent a large amount of time trying to find an elegant answer to the problem of packaging and deployment. How do we get code onto our servers?
One of the most commonly accepted ways of doing this is using git to pull a branch or tag onto a server, perhaps using something like Ansible or Fabric to manage the deployment. Now, we already use Ansible in the dev team for configuration and deployment of all our servers, but something about giving all our server’s direct access to the git repository, even if it’s read-only, doesn’t sit well with me. This also gets more complex when you’re deploying into virtual environments, or anything which requires additional libraries to be installed after you’ve pulled the code down.
We exclusively use Ubuntu for our servers, so after spending a little time researching how other people had done it, I eventually decided that the best way to package up the code was going to be create a DEB package for the code and use the system’s package manager to handle the installation and upgrades. Rob McQueen’s post over at the Nylas blog was a big influence on the decisions I made along the way, and after some vacillating between fpm and dh-virtualenv I decided to go with the latter.
dh-virtualenv got me a lot closer to the specific end goal I was aiming for. It allows me to use standard Debian/Ubuntu packaging tools with a Python project and will package up the application and all its requirements into an installable virtual environment. Thus no post-install installation of requirements needs to be carried out. That’s not to say that there is anything wrong with fpm, it’s an awesome tool and has a fantastic range of input and output options (not limited to DEB), but just isn’t what I need for this project.
I’ve put together a small repository in our GitHub account containing a variety of config files to give some context to this post, and will refer to these as we go.
The Dockerfile defines a docker container that we use for the actual building of the package, it has a bunch of Python development requirements and packaging requirements pre-installed, as well as having a dput.cf file, known_hosts file and some build keys for accessing our repository (which I will cover in detail a little later on in this post).
The debian/ directory contains all the packaging goodness; most the logic being contained in the rules file. Most of the code in here is derived from the dh-virtualenv-mold cookie cutter templates with a few modifications (as we’re using Python 3 exclusively), and defines the cleaning and building rules for the package.
The control file defines all the settings for the package itself and the install and build requirements from a system package point of view.
Now, it’s very important to note that the build process makes use of the changelog file to determine the version of the package that it’s building, and for what distribution. You’ll see in this file the use of UNRELEASED and staging as distributions, I’ll come back to this later.
As an aside – one decision I made relatively early in the process was to use native version numbers instead of Ubuntu specific ports (1.0.0 vs 1.0.0-0ubuntu1 for example). While there is nothing technically stopping our Python software from being run on multiple Linux distributions, for the time being it’s simplified the deployment of the code. On top of that, it means that I’ve been able to put the debian/ packaging directory in the master branch of the repository, rather than worrying about having a separate branch just for packaging. This may change in future, but for the time being it works just fine!
The Makefile in the root has a single target, deb, which just runs dpkg-buildpackage with some specific options to ensure that it doesn’t try and sign the source package or changes file. The reason that I haven’t used something a little higher up in the chain like pdebuild is that we’re using GitLab CI with Docker containers, andpbuilder isn’t currently supported by Docker due to the way it accesses /proc.
I was anxious from the get-go to ensure that every commit to the repository and every tag followed the same release process. This way we can test our install process continuously by installing the package. When it came to version numbers this posed a bit of a problem as I wanted to ensure that every built version was newer than the previous, but also would be superseded by tagged versions, and I certainly didn’t want to be editing the changelog file on every commit.
I took a leaf out of jenkins-debian-glue‘s book here and if you look in build-snapshot you can see that we use a small script I wrote called git-snapshot that basically takes the existing version number and appends the date and time to the version number (using ~ as this comes before everything else in the Debian/Ubuntu versioning scheme) in the changelog before building the package. So you end up with version numbers like 1.0.0~20161009150015+git1233451abcd which comes before version 1.0.0.
If the current version is a released version (i.e. for any distribution except UNRELEASED) it will append +0 first, jenkins-debian-glue found that this was the best way to generate a ‘next version’ while ensuring practically anything else is later than it, e.g. 1.0.0 < 1.0.0+0 < 1.0.1, and I saw no reason not to follow their lead on this!
On every commit to master, build-snapshot is run, which builds a snapshot package. GitLab CI also allows us to run a different set of commands when a tag is created, so for every tag build-release is run instead which skips the generation of a new version number – this assumes that we update the changelog before tagging, of course!
So after all this we now have shiny packages being generated on every commit, but where do they go? We clearly need to put them somewhere so that our servers can access them. For this I decided to run a repository server with reprepro, with three distinct distributions, development, staging and production. The git-snapshot command will set the package distribution to development, this is where all my snapshot packages go and should only be used for cutting-edge boxes. staging is where all ‘released’ packages go, this is for packages that need testing before they’re released. Finally, production ready packages go into the production repository, and must be manually moved from the stagingrepository to ensure that testing packages can’t accidentally be released by an automated process.
This repository server is hosted internally and the servers that require it will have the appropriate distribution added as an apt source (as part our Ansible build, of course!). While packages uploaded to the repository server are not signed because the upload route is trusted, the package server itself has a GPG key for signing the releases and the public key is loaded onto all of our boxes as part of our Ansible deployment.
To get from completed package to repository server, we use dput with the config file loaded into the Docker container along with the upload user’s RSA key and a known_hosts file with the public SSH key of the repository server to ensure that the whole process is automated and doesn’t require user input.
So, with all the pieces put together, within a few minutes of a tag being created and pushed, I have a install-ready package in the staging repository that can be installed with a simple apt-get install on a staging box and it’s ready to go. I have no doubt we’ll chop and change parts of our process as we go, but this is where we’ve got to after a few weeks of research and implementation, and I for one am very happy as to the result!
If you’ve got any questions or comments, please feel free to put them in the comments below and I’ll do my best to answer them.