-*- mode: org -*- This is a tool/service with the aim of making it easier to perform lots of builds across potentially many machines, and do something with the results and outputs of those builds. The aim of the tool is to make it easier to operate a build farm providing Guix substitutes, or do testing of the builds for Guix packages, potentially across different machines with different hardware and software setups. * Usage instructions All the following commands should be run from the root of the Git repository. Either use direnv to manage the environment, or run guix environment to setup the environment. #+BEGIN_SRC sh guix environment -l guix-dev.scm export PATH="$PWD/scripts:$PATH" #+END_SRC If you haven't yet done so, run the following commands to get the repository setup to run the software. #+BEGIN_SRC sh ./bootstrap.sh ./configure make #+END_SRC Run guix-build-coordinator to start the coordinator process. By default, this will use sqitch to create the guix_build_coordinator.db SQLite database file, as well as sqitch.db which contains metadata about the database state. #+BEGIN_SRC sh guix-build-coordinator #+END_SRC In another terminal, run the following commands also at the root of the repository to setup an agent process. #+BEGIN_SRC sh guix-build-coordinator agent new #+END_SRC Note the UUID of the generated agent. #+BEGIN_SRC sh guix-build-coordinator agent password new #+END_SRC Note the generated password for the agent. #+BEGIN_SRC sh guix-build-coordinator-agent --uuid= --password= #+END_SRC At this point, both processes should be running and the guix-build-coordinator should be logging requests from the agent. In a third terminal, also at the root of the repository, generate a derivation, and then instruct the coordinator to have it built. #+BEGIN_SRC sh guix build --no-grafts -d hello #+END_SRC Note the derivation that is output. #+BEGIN_SRC sh guix-build-coordinator build #+END_SRC This will return a randomly generated UUID that represents the build. If everything works, the agent will setup and perform the build, and then the coordinator will output something like: build succeeded (on agent ) * Architecture One coordinator process manages one or more agent processes. The coordinator stores what to build, and allocates builds to agents as they request them. Agent processes perform the builds, and inform the coordinator when the build succeeds or fails. When the build succeeds, the agent sends the outputs produced to the coordinator. Builds have a specified or randomly generated UUID. The action to perform is specified by a derivation as understood by GNU Guix. It's expected that the derivation is either available to the coordinator and all agents, or that they're able to download it from a substitute server. Agents will only build the derivation they've been instructed to. It's expected that any inputs required are either available, or downloadable from a substitute server. If an input isn't available, the agent will report a setup failure to the coordinator. Agents also require the outputs of the derivation they're going to build, not to be present. They'll attempt to delete them if they are, and report a setup failure to the coordinator if this doesn't work. The build may then be tried on another agent if one is available. Some coordinator behaviour is configurable, but hooks are also provided to execute code on certain events. This code can access the coordinator datastore, and perform operations like submitting builds. There are hooks that trigger when a build is successful, a build fails, and a agent reports missing inputs. The default missing inputs hook will submit builds for these missing inputs if none are present. This is the default hook behaviour to allow automatically building derivations where the inputs are not available, however the hook can be replaced if desired. The datastore for the coordinator, and the way the agent <-> coordinator communication happens is designed to support different modes of operation. For the datastore, SQLite support is implemented and PostgreSQL support is planned. For the agent <-> coordinator communication, HTTP is used currently, but other methods like message passing over SSH could be supported in the future. With the HTTP transport for coordinator <-> agent communication, this should happen over TLS for security if the network isn't secure. Each agent uses basic authentication to connect to the coordinator. * Roadmap In no particular order. ** TODO Add help information to the command line interface ** TODO Continue working on web interfaces I think these should be maintained separately, and I've got some code in JavaScript, but this could do with more attention. I think having a web interface that allows for submitting builds would be useful. ** TODO Extract and polish scripts for submitting builds ** TODO Adjust the database schema to not use as many string foreign keys I think this is taking up more space and slowing things down for larger tables. ** TODO Implement support for PostgreSQL This was the intention from the start, and once the database schema has settled, it's time to actually implement and test this. ** TODO Write some unit tests ** TODO Detect absent agents *** TODO De-allocate builds from them *** TODO Only look at active agents when allocating builds ** TODO Support archiving agents ** TODO Consider some kind of archiving/deleting for builds ** TODO Speed up checking for build input substitutes ** TODO Investigate how to record hardware information CPUs/RAM at the time builds take place. ** TODO Investigate setting timeouts for builds Both timeout and max silent time. Currently this is generally set on the guix-daemon's only. ** TODO Record whether failures are due to timeouts