guix/build-coordinator

1. Usage instructions
2. Architecture
3. Roadmap

-- mode: org --

This is a tool/service with the aim of making it easier to perform lots of builds across potentially many machines, and do something with the results and outputs of those builds.

The aim of the tool is to make it easier to operate a build farm providing Guix substitutes, or do testing of the builds for Guix packages, potentially across different machines with different hardware and software setups.

1. Usage instructions

All the following commands should be run from the root of the Git repository. Either use direnv to manage the environment, or run guix environment to setup the environment.

guix environment -l guix-dev.scm
export PATH="$PWD/scripts:$PATH"

If you haven't yet done so, run the following commands to get the repository setup to run the software.

./bootstrap.sh
./configure
make

Run guix-build-coordinator to start the coordinator process. By default, this will use sqitch to create the guix_build_{coordinator.db} SQLite database file, as well as sqitch.db which contains metadata about the database state.

guix-build-coordinator

In another terminal, run the following commands also at the root of the repository to setup an agent process.

guix-build-coordinator agent new

Note the UUID of the generated agent.

guix-build-coordinator agent <AGENT ID> password new

Note the generated password for the agent.

guix-build-coordinator-agent --uuid=<AGENT ID> --password=<AGENT PASSWORD>

At this point, both processes should be running and the guix-build-coordinator should be logging requests from the agent.

In a third terminal, also at the root of the repository, generate a derivation, and then instruct the coordinator to have it built.

guix build --no-grafts -d hello

Note the derivation that is output.

guix-build-coordinator build <DERIVATION FILE>

This will return a randomly generated UUID that represents the build. If everything works, the agent will setup and perform the build, and then the coordinator will output something like:

build <BUILD ID> succeeded (on agent <AGENT ID>)

2. Architecture

One coordinator process manages one or more agent processes. The coordinator stores what to build, and allocates builds to agents as they request them. Agent processes perform the builds, and inform the coordinator when the build succeeds or fails. When the build succeeds, the agent sends the outputs produced to the coordinator.

Builds have a specified or randomly generated UUID. The action to perform is specified by a derivation as understood by GNU Guix. It's expected that the derivation is either available to the coordinator and all agents, or that they're able to download it from a substitute server.

Agents will only build the derivation they've been instructed to. It's expected that any inputs required are either available, or downloadable from a substitute server. If an input isn't available, the agent will report a setup failure to the coordinator.

Agents also require the outputs of the derivation they're going to build, not to be present. They'll attempt to delete them if they are, and report a setup failure to the coordinator if this doesn't work. The build may then be tried on another agent if one is available.

Some coordinator behaviour is configurable, but hooks are also provided to execute code on certain events. This code can access the coordinator datastore, and perform operations like submitting builds.

There are hooks that trigger when a build is successful, a build fails, and a agent reports missing inputs. The default missing inputs hook will submit builds for these missing inputs if none are present. This is the default hook behaviour to allow automatically building derivations where the inputs are not available, however the hook can be replaced if desired.

The datastore for the coordinator, and the way the agent <-> coordinator communication happens is designed to support different modes of operation. For the datastore, SQLite support is implemented and PostgreSQL support is planned. For the agent <-> coordinator communication, HTTP is used currently, but other methods like message passing over SSH could be supported in the future.

With the HTTP transport for coordinator <-> agent communication, this should happen over TLS for security if the network isn't secure. Each agent uses basic authentication to connect to the coordinator.

3. Roadmap

In no particular order.

3.1. TODO Add help information to the command line interface

3.2. TODO Continue working on web interfaces

I think these should be maintained separately, and I've got some code in JavaScript, but this could do with more attention.

I think having a web interface that allows for submitting builds would be useful.

3.3. TODO Extract and polish scripts for submitting builds

3.4. TODO Implement support for PostgreSQL

This was the intention from the start, and once the database schema has settled, it's time to actually implement and test this.

3.5. TODO Write some unit tests

3.6. TODO Detect absent agents

3.6.1. TODO De-allocate builds from them

3.6.2. TODO Only look at active agents when allocating builds

3.7. TODO Support archiving agents

3.8. TODO Consider some kind of archiving/deleting for builds

3.9. TODO Speed up checking for build input substitutes

3.10. TODO Investigate how to record hardware information

CPUs/RAM at the time builds take place.

3.11. TODO Investigate setting timeouts for builds

Both timeout and max silent time. Currently this is generally set on the guix-daemon's only.

Table of Contents