Three Bash Projects, Eight Distros, Ninety Seconds
A stray mapfile -d in a shared library. It parses on the developer's Debian 12 box, passes lint, passes review, and ships. Six hours later a customer on CentOS 6 is staring at a declare: -A: invalid option from bash 4.1. That is the class of bug we want to fail on commit, not in production. Building a matrix CI that catches it in under two minutes, on hardware we already own, took three components and one afternoon of TLS plumbing.
We ship three Bash projects: APF, BFD, and maldet. Each deploys across a matrix spanning CentOS 6 (2011-era coreutils in /bin), modern Rocky and Ubuntu, and whatever long-tail distros our users are still on. For years the answer was “run it in a VM and see what breaks,” which scales to one OS, poorly to three, and not at all to eight. Cloud CI charges per minute and our workload is dominated by cold builds. We wanted the full matrix to finish in the time it takes to make coffee, on hardware we already own.
Stack at a Glance
- BATS runner (TAP output, bash functions as tests)
- batsman shared harness, git submodule at
tests/infra/ - Docker over TCP with mutual TLS on port 2376
- BuildKit enabled daemon-side so the layer cache works
- Parallel fanout across eight OS containers, TAP to stdout
- Fallback lane to local Docker when the build host is down
Why BATS#
When the code under test is bash, the tempting answer is a higher-level runner in Python or Go. We tried it. The integration layer (spawn, capture, diff) grew to outweigh the tests, and it lied about the runtime: a subprocess launched from pytest has a different inherited environment than cron or systemd, and our regressions live exactly in that seam.
BATS (Bash Automated Testing System) is a shell-native test runner. Each test is a bash function. The run builtin captures stdout, stderr, and exit code; assertions hang off status and output. Output conforms to the Test Anything Protocol, so it pipes straight into any CI dashboard that understands TAP.
#!/usr/bin/env bats
load '/usr/local/lib/bats/bats-support/load'
load '/usr/local/lib/bats/bats-assert/load'
source /opt/tests/helpers/assert-scan.bash
SAMPLES_DIR="/opt/tests/samples"
TEST_SCAN_DIR="/tmp/lmd-test-scan"
setup() {
source /opt/tests/helpers/reset-lmd.sh
mkdir -p "$TEST_SCAN_DIR"
}
teardown() {
rm -rf "$TEST_SCAN_DIR"
}
@test "MD5 scan detects known test sample (EICAR)" {
cp "$SAMPLES_DIR/eicar.com" "$TEST_SCAN_DIR/"
run maldet -a "$TEST_SCAN_DIR"
assert_scan_completed
assert_output --partial "malware hits 1"
}That is the whole test. It invokes the real maldet binary the way cron would, in the container's actual environment, and asserts on the output an operator would see. No mocks. No fixtures that drift. When a test fails, bash -x the failing bit and read the trace.
Each project keeps a helpers/ directory of domain-specific assert_* functions (scan completion, quarantine presence, firewall chain state). Every test sources bats-support and bats-assert from the container image. The tests read like prose and measure what production actually does.
Why Containers, Not VMs#
APF, BFD, and maldet each need to work on CentOS 6, CentOS 7, Rocky 8, Rocky 9, Ubuntu 20.04, Ubuntu 24.04, Debian 12, and (project-dependent) a legacy ubuntu1204 or a Rocky 10 preview. Three projects × eight images, rebuilt on every library change, is a container workload, not a VM workload.
Containers give us three things a VM farm does not:
COPY. VMs provision from scratch.Dockerfile.<os> is in version control. A new Rocky minor that breaks something is a diff on a twenty-line file, not a VM template rebuild.CentOS 6 is the interesting one. Long past EOL, but our user base still includes hosts that refuse to leave it. Frozen base images live on Docker Hub and in community mirrors; they run under modern Docker with vsyscall=emulate on the host kernel. Our image installs bash 4.1 and coreutils from the Vault repo. When a dev used mapfile -d (bash 4.4+) in a shared library, the CentOS 6 container caught it before a user did. That one container justifies the whole stack.
The batsman Submodule#
Three projects, three harnesses, three copies of the same runner, drifting independently. First cut, painful. Every fix landed three times. Every new distro was added three times. Runner updates in one project took weeks to reach the others.
The fix is batsman, a shared harness repo. It owns the Docker runner, the Makefile.tests include fragment, and a base set of Dockerfiles per OS target. Consumers pull it in as a git submodule at tests/infra/.
# In each consumer repo
$ git submodule add https://github.com/rfxn/batsman.git tests/infra
$ ls tests/infra/
dockerfiles/ include/ lib/ scripts/
# Project-level Makefile (include the shared fragment)
$ cat tests/Makefile
BATSMAN_PROJECT := lmd
BATSMAN_OS_MODERN := debian12 rocky9 ubuntu2404
BATSMAN_OS_LEGACY := centos7 rocky8 ubuntu2004
BATSMAN_OS_DEEP := centos6
export BATSMAN_TEST_TIMEOUT ?= 180
include infra/include/Makefile.testsSubmodule semantics give us what we need. Each consumer pins a specific batsman commit. Upgrades are intentional: run git submodule update --remote tests/infra, verify the suite passes, commit the new pointer. No silent drift.
BATSMAN_PROJECT is the project's identity. The runner keys off it for the install script, the default container path ( /opt/tests by convention), and the container name prefix. Everything else (OS list, per-test timeouts, extra Docker flags) is a Make variable consumers override only when needed.
Consumers own the Dockerfiles that install their own code. batsman ships base images with bash, coreutils, and bats preinstalled; each project's tests/dockerfiles/Dockerfile.<os> extends the base (maldet adds ClamAV and YARA; APF adds iptables and ipset). The shared layer caches once; the project layer rebuilds only when project files change.
Docker over TCP with Mutual TLS#
Test host and dev host are different machines. We develop on freedom (the workstation) and test on anvil (a dedicated LAN build box with no other job). Running the matrix on freedom freezes the workstation for two minutes per push. Running it on anvil means shipping source over and collecting results somehow.
The common answer is SSH: pipe docker commands through ssh or rsync the source and trigger a remote make. Both work. Both add per-command connection overhead, serialize anything that could stream, and fight with the Docker client's assumption that it is talking to a daemon directly. BuildKit streaming in particular falls on its face over SSH pipes.
The better answer is the one Docker was designed for: expose the daemon over TCP with mutual TLS, and point the client at it.
| Property | SSH tunnel | Docker over TCP + mTLS |
|---|---|---|
| per-command handshake | yes (SSH connect each time) | no (persistent TLS) |
| BuildKit streaming | broken / degraded | native |
parallel docker run | serialized by tunnel | actually parallel on daemon |
| source at rest on build host | yes (rsync copy) | no (bind-mount only) |
| auth model | SSH keys | client cert signed by CA |
anvil's daemon listens on tcp://0.0.0.0:2376. Server certs at /etc/docker/tls/, freedom's client cert and matching CA at ~/.docker/tls/. Both sides authenticate on every connection; the daemon rejects any client cert not signed by the CA. Running the matrix is a one-liner:
# Run the full matrix on anvil, stream results back to /tmp
DOCKER_HOST=tcp://192.168.2.189:2376 \
DOCKER_TLS_VERIFY=1 \
DOCKER_CERT_PATH=~/.docker/tls \
make -C tests test 2>&1 | tee /tmp/test-lmd-debian12.log | tail -30
# Or use a named context (one-time setup, no env vars after)
$ docker context create anvil \
--docker host=tcp://192.168.2.189:2376,ca=~/.docker/tls/ca.pem,\
cert=~/.docker/tls/cert.pem,key=~/.docker/tls/key.pem
$ docker --context anvil ps
$ make -C tests test # inherits DOCKER_HOST from contextNamed Docker contexts make the everyday case one word ( --context anvil) instead of three env vars. The important properties are all practical. The client sees a local-feeling Docker daemon. BuildKit streams work natively. Parallel docker run calls for different OS containers actually run in parallel on the daemon side. There is no SSH connect handshake on every command, and no copy of the source code on anvil at rest. Source gets bind-mounted into each container for the duration of its run and disappears with the container.
The mTLS part is not fancy: a CA, a server cert for anvil, a client cert for freedom, ten-year validity, key directories at chmod 700. This is not a zero-trust stack. It is a pragmatic one-admin, two-host setup that keeps the daemon off the public internet and gates access on possession of a specific client cert.
The BuildKit Gotcha#
Callout
Without DOCKER_BUILDKIT=1 in the build host's shell environment, the daemon falls back to the legacy builder. BuildKit owns the content-addressable layer cache for TCP-delivered build contexts. The legacy builder does not. Every build rebuilds every layer. A warm matrix run that should take ninety seconds takes fifteen minutes instead. See the Docker Engine release notes for the history of BuildKit becoming the default.
We hit this three times before we believed it. The symptom is boring: anvil feels slow for no reason. The fix is one line in anvil's ~/.bashrc:
# On anvil (the build host), exported for all shells
# Docker's legacy builder ignores layer cache for TCP-delivered build
# contexts; BuildKit is required for the cache to work at all.
export DOCKER_BUILDKIT=1The test runner also prints a reminder to check echo $DOCKER_BUILDKIT on the daemon host if a warm-cache build exceeds five minutes. Config drift on the daemon side is invisible from the client; latency is the only signature, and latency is easy to blame on the network.
Timings#
The numbers we actually hit, measured across recent runs:
~70s
local, single OS
~45s
anvil, single OS
~110s
anvil, full matrix (8 OSes)
~15m
same matrix, no BuildKit
| Configuration | Target | Warm | Cold |
|---|---|---|---|
| local (freedom) | single OS (Debian 12) | ~70s | +60s image build |
| anvil over TCP | single OS (Debian 12) | ~45s | +30s image build |
| anvil over TCP | full matrix (8 OSes) | ~110s | +4-6 min image builds |
| anvil, no BuildKit | full matrix (8 OSes) | ~15 min | ~15 min |
anvil beats local on single-OS runs because it has no other workload competing for CPU. The full-matrix number is the one that matters: eight distros, warm-cache run completes before you switch windows.
“Local iteration for single files. Anvil for the full matrix. Fallback for anvil outages. Three lanes, same command.”
Local runs still earn their keep. Iterating on a single .bats file against Debian 12 means no network hop; once it passes on the most forgiving OS, kick the full anvil run and go read Hacker News.
That matrix run is the payoff for everything else. When we rewrote maldet's scan engine and benchmarked a 43x parallel-scan speedup, proving it across every supported distro before merge took ninety seconds, not an afternoon of manual VM runs.
Fallback Discipline#
anvil is a single point of failure. When the LAN is flaky or anvil is rebooting for kernel updates, the fallback is freedom's own daemon, listening with the same mTLS setup on 127.0.0.1:2376:
# Fallback: run on freedom's local Docker daemon
DOCKER_HOST=tcp://127.0.0.1:2376 \
DOCKER_TLS_VERIFY=1 \
DOCKER_CERT_PATH=~/.docker/tls \
make -C tests test 2>&1 | tee /tmp/test-lmd-debian12.log | tail -30
# Or via a named context
$ docker --context freedom-tcp ps
$ DOCKER_CONTEXT=freedom-tcp make -C tests testFallback is also the right lane for tests that need host-local bind-mounted data: a production signature snapshot, a captured pcap, a corpus too large to sync. Those get tagged “freedom-only” in the Makefile and skipped on anvil. The rule is simple: if the test needs data outside the git tree, it runs on freedom; committed fixtures under tests/samples/ run anywhere.
What We Test, What We Skip#
Container-based testing covers runtime behavior. It does not cover every install-time path. Here is the honest split.
Caught reliably
@test in the nearest BATS file. Fails loudly if the bug returns.Still needs a live host
cron is not running. We test the scripts cron.daily invokes, not cron itself firing them.We are fine with that gap, and honest about its shape. The container suite catches every bug class that lint misses and runs fast enough to be a pre-commit hook. Install-path coverage (cron firing, init wrappers, package-manager upgrade paths) runs on a small set of live VMs on the release path, not on every commit. This is not a production CI replacement for every team: it is single-admin, two-host, and the mTLS setup presumes you are comfortable signing your own certs. If those assumptions hold, the rest is mechanical.
Conclusion#
The bill for this CI stack: one LAN build host we already owned, one-time effort on ten Dockerfiles and a shared Makefile fragment, an afternoon on TLS certs. No monthly invoice, no per-minute billing, no queue depth, no provider outages.
The payoff is every commit getting exercised against the distro matrix our users actually run. A bash quirk that only bites on CentOS 6 fails the CentOS 6 test. A mapfile -d in a shared library fails the moment it hits the legacy container. That feedback loop is the difference between shipping confidently and shipping cautiously. For the companion piece on why the matrix matters in the first place (the coreutils-location split, command prefix discipline, and other portability landmines), see portable bash across the pre-usr-merge boundary.
APF, BFD, and maldet are all open source under GPLv2. batsman is open source under GPLv2.
The Minimum Stack
If you are shipping bash (or any shell-adjacent code) to a distro matrix, the pattern reproduces from scratch in an afternoon:
- Pick bats-core as the runner; add bats-assert and bats-support for readable failures.
- Template a
Dockerfile.<os>per target; keep base layer (bash, coreutils, bats) separate from the project layer so the cache helps. - On a spare LAN box, enable Docker over TCP with mutual TLS: CA, one server cert, one client cert, lock key perms, done.
- Export
DOCKER_BUILDKIT=1in the daemon host's shell so the layer cache works for TCP-delivered contexts. - Wire a named Docker context on the client, point
make testat it, keep a local-Docker context as fallback.
No monthly bill. No queue depth. No provider outages. Just the matrix, running on hardware you already own.