Building statically linked binaries in Haskell with Docker

Posted in category haskell on 2018-01-02
Tags: haskell, docker

Table of Contents

One of the attractive features of Go is that it builds your projects as statically linked binaries by default when possible. Which has a very convenient side-effect - it is simple to containerize. On the other hand, there is the most obvious trade-off to it - it is not reusing dependencies installed in the Operating System, but instead bundles everything into single binary.

Unfortunately, traditionally it is not as simple in Haskell as in Go. Here I will present rather portable solution that should be possible to re-use even without Docker infrastructure.

Statically linking blog engine

Here goes simplistic blog.cabal file (real project file for my Hakyll-based blogging engine that can be found here):

name:               blog
version:            0.1.0.0
synopsis:           Personal blog engine
description:        Please see README.md for more details
build-type:         Simple
category:           Web
author:             Roman Kuznetsov
maintainer:         roman@kuznero.com
license:            BSD3
license-file:       LICENSE
cabal-version:      >= 1.10

executable blog
  main-is: blog.hs
  build-depends:    base < 5
                  , hakyll == 4.9.8.0
                  , containers == 0.5.7.1
                  , pandoc == 1.19.2.4
                  , aeson == 1.1.2.0
                  , text == 1.2.2.2
                  , bytestring == 0.10.8.1
                  , ConfigFile == 1.1.4
                  , pretty-show == 1.6.13
                  , process == 1.4.3.0
                  , directory == 1.3.0.0
  ghc-options: -threaded
  default-language: Haskell2010

source-repository head
  type:     git
  location: https://github.com/kuznero/blog

Here goes the Dockerfile for my blog project that statically links final binary that I can then extract and use on any Operating System with the same architecture:

Notice the cabal configure ... line. This is exactly what makes it possible to build statically linked binary. During the build process, compiler might warn you about some corner cases when statically linking. So, make sure you understand these warnings and risks associated with it.

There are couple of things going on here.

First, this Dockerfile is using multi-stage build system which comes really handy when what you need is to extract final statically linked binary without getting the whole GHC/Cabal infrastructure with you. That will also make final image size small.

Second, it uses one build-time argument, proxy, that let’s you pass build-time proxy settings in case you are working behind a corporate proxy. And strictly speaking this is not necessary at all.

Now, if you will try to build an image, you will notice that it really takes long time to complete:

docker build -t blog .
docker build -t blog --build-arg proxy=http://127.0.0.1:3128/ .

Gotchas

“No such protocol name: tcp” error

Sometimes in case your project is using network package and you are trying to package it in alpine image (which is considered to be one of the most slim images out there for now), you might get the following error in run-time:

Network.BSD.getProtocolByName: does not exist (no such protocol name: tcp) in haskell

This is pretty annoying, especially since you’ve got pretty far getting your statically linked binary out. In this case, it is possible to solve it by using ubuntu:17.04 base image with missing dependencies (which I didn’t manage to find by now in alpine package repository) - ca-certificates, libgnutls30 and netbase:

Produced image is going to be bigger than the one built on alpine - but we are talking about 90 MB increase which can be considered negligible.

Data-files in dependencies

In cases when you are using libraries that have data-files instruction in its Cabal-files, you most likely will end up in a situation when your statically linked binary will fail in runtime. A good example at hands is Hakyll library that declares among other things some important data files:

Data-files:
  templates/atom-item.xml
  templates/atom.xml
  templates/rss-item.xml
  templates/rss.xml

These files are used to render RSS and ATOM feed files on my blog.

The way I solved it is not perfect as it cannot be applied to multiple libraries at the same time or generically change logic behind getDataFileName function from Paths_* module (for that you will likely need to change either GHC or Cabal). So, to keep it simple I cloned (copied in fact) hakyll-4.9.8.0 source into my project structure, upgraded version to 4.9.8.1 and changed the logic related to getting content for data files in such a way that I try to see if there are any files with the same path in the current working directory first, and only then fallback to checking the content with using data-files logic from Paths_ module.

Here is how the code looked like before in hakyll-4.9.8.0:

And here is how it looks now in hakyll-4.9.8.1:

This is not yet a pull request as of now, but I might end up submitting it at some point.

Besides the fact that logic is hidden behind compilerUnsafeIO, it should be clear that I first check if there is a local version of a path and only if it does not exist locally, I fallback to getDataFileName from Paths_ module.

It should be possible to make similar changes to other libraries that use data-files instruction. Though I suspect that this size does not necessarily fit all, and there are some other cases when this way is simply not appropriate.

Optimizing for convenient development process

Primary reason being that cabal update and cabal install --dependencies-only will really take long time, and depending on the number of dependencies (direct and indirect) it might take longer. This can be solved by splitting the process into two parts:

Part 1, that builds a base image with all the required dependencies.

Part 2, that solely builds your project and packages it as a clean and slim Docker image.

Build new base image with all the dependencies

As a convention, this needs to be built as blog:dev image. This is the base image name used later.

This will allow you to build a base image once and get back to hopefully smooth development process where you will only need to trigger the build for code changes in your project and not in the base layer.

Build & package final clean and slim image

This will allow you concentrate on producing a final clean and slim image based on alpine:3.7 in this case without wasting too much of your time.