-*- mode: org -*-

This is a tool/service with the aim of making it easier to perform lots of
builds across potentially many machines, and do something with the results and
outputs of those builds.

The aim of the tool is to make it easier to operate a build farm providing
Guix substitutes, or do testing of the builds for Guix packages, potentially
across different machines with different hardware and software setups.

* Usage instructions

All the following commands should be run from the root of the Git
repository. Either use direnv to manage the environment, or run guix
environment to setup the environment.

#+BEGIN_SRC sh
  guix environment -l guix-dev.scm
  export PATH="$PWD/scripts:$PATH"
#+END_SRC

If you haven't yet done so, run the following commands to get the repository
setup to run the software.

#+BEGIN_SRC sh
  ./bootstrap.sh
  ./configure
  make
#+END_SRC

Run guix-build-coordinator to start the coordinator process. By default, this
will use sqitch to create the guix_build_coordinator.db SQLite database file,
as well as sqitch.db which contains metadata about the database state.

#+BEGIN_SRC sh
  guix-build-coordinator
#+END_SRC

In another terminal, run the following commands also at the root of the
repository to setup an agent process.

#+BEGIN_SRC sh
  guix-build-coordinator agent new
#+END_SRC

Note the UUID of the generated agent.

#+BEGIN_SRC sh
  guix-build-coordinator agent <AGENT ID> password new
#+END_SRC

Note the generated password for the agent.

#+BEGIN_SRC sh
  guix-build-coordinator-agent --uuid=<AGENT ID> --password=<AGENT PASSWORD>
#+END_SRC

At this point, both processes should be running and the guix-build-coordinator
should be logging requests from the agent.

In a third terminal, also at the root of the repository, generate a
derivation, and then instruct the coordinator to have it built.

#+BEGIN_SRC sh
  guix build --no-grafts -d hello
#+END_SRC

Note the derivation that is output.

#+BEGIN_SRC sh
  guix-build-coordinator build <DERIVATION FILE>
#+END_SRC

This will return a randomly generated UUID that represents the build. If
everything works, the agent will setup and perform the build, and then the
coordinator will output something like:

  build <BUILD ID> succeeded (on agent <AGENT ID>)

* Architecture

One coordinator process manages one or more agent processes. The coordinator
stores what to build, and allocates builds to agents as they request
them. Agent processes perform the builds, and inform the coordinator when the
build succeeds or fails. When the build succeeds, the agent sends the outputs
produced to the coordinator.

Builds have a specified or randomly generated UUID. The action to perform is
specified by a derivation as understood by GNU Guix. It's expected that the
derivation is either available to the coordinator and all agents, or that
they're able to download it from a substitute server.

Agents will only build the derivation they've been instructed to. It's
expected that any inputs required are either available, or downloadable from a
substitute server. If an input isn't available, the agent will report a setup
failure to the coordinator.

Agents also require the outputs of the derivation they're going to build, not
to be present. They'll attempt to delete them if they are, and report a setup
failure to the coordinator if this doesn't work. The build may then be tried
on another agent if one is available.

Some coordinator behaviour is configurable, but hooks are also provided to
execute code on certain events. This code can access the coordinator
datastore, and perform operations like submitting builds.

There are hooks that trigger when a build is successful, a build fails, and a
agent reports missing inputs. The default missing inputs hook will submit
builds for these missing inputs if none are present. This is the default hook
behaviour to allow automatically building derivations where the inputs are not
available, however the hook can be replaced if desired.

The datastore for the coordinator, and the way the agent <-> coordinator
communication happens is designed to support different modes of operation. For
the datastore, SQLite support is implemented and PostgreSQL support is
planned. For the agent <-> coordinator communication, HTTP is used currently,
but other methods like message passing over SSH could be supported in the
future.

With the HTTP transport for coordinator <-> agent communication, this should
happen over TLS for security if the network isn't secure. Each agent uses
basic authentication to connect to the coordinator.

* Roadmap

In no particular order.

** TODO Add help information to the command line interface
** TODO Continue working on web interfaces

I think these should be maintained separately, and I've got some code in
JavaScript, but this could do with more attention.

I think having a web interface that allows for submitting builds would be
useful.

** TODO Extract and polish scripts for submitting builds
** TODO Adjust the database schema to not use as many string foreign keys

I think this is taking up more space and slowing things down for larger
tables.

** TODO Implement support for PostgreSQL

This was the intention from the start, and once the database schema has
settled, it's time to actually implement and test this.

** TODO Write some unit tests
** TODO Detect absent agents
*** TODO De-allocate builds from them
*** TODO Only look at active agents when allocating builds
** TODO Support archiving agents
** TODO Consider some kind of archiving/deleting for builds
** TODO Speed up checking for build input substitutes
** TODO Investigate how to record hardware information

CPUs/RAM at the time builds take place.

** TODO Investigate setting timeouts for builds

Both timeout and max silent time. Currently this is generally set on the
guix-daemon's only.

** TODO Record whether failures are due to timeouts