aboutsummaryrefslogtreecommitdiff
path: root/guix-build-coordinator/agent.scm
Commit message (Collapse)AuthorAge
* Fix issue with the change of the load average periodChristopher Baines2020-12-23
|
* Switch to using the 1 minute load averageChristopher Baines2020-12-23
| | | | So that it's more responsive.
* Avoid starting builds if the system load is highChristopher Baines2020-12-23
|
* Improve the guix-daemon claims a substitute is unavailable messagesChristopher Baines2020-12-23
| | | | Include the substitute servers that should be providing the substitute.
* Implement build cancelationChristopher Baines2020-12-16
|
* Guard against a weird state for missing build inputsChristopher Baines2020-12-15
| | | | | | Where there are missing files, but find-missing-substitutes-for-output doesn't return anything. I think this can happen when the substitutes should be available, but there was an error when fetching them.
* Add missing lgr argumentChristopher Baines2020-12-10
|
* Start using Prometheus metrics with the agentChristopher Baines2020-12-07
| | | | | Rather than having the agent run a webserver, use the textfile collector from the node exporter.
* Avoid setting timeout options for the daemon for the buildChristopher Baines2020-12-06
| | | | | The timeouts are useful when fetching substitutes, but I want to keep the previous behaviour of using the values set in the daemon for the build itself.
* Print out when builds fail due to timeoutsChristopher Baines2020-12-06
|
* Add some randomisation to substitute delaysChristopher Baines2020-12-05
|
* Add in timeouts around fetching substitutesChristopher Baines2020-12-05
| | | | As I think this sometimes hangs.
* Fix agent confusion over how many builds are runningChristopher Baines2020-12-04
| | | | | The previous code was less than ideal, this simpler and avoids less messy state.
* Replace WARNING with WARNChristopher Baines2020-11-30
| | | | As it's shorter, and this keeps the logging neat.
* Add some useful logging when the agent startsChristopher Baines2020-11-30
|
* Improve logging of new buildsChristopher Baines2020-11-30
|
* Hide logging from garbage collectionChristopher Baines2020-11-30
|
* Improve the logging from the agent -> coordinator communicationChristopher Baines2020-11-30
|
* Improve handling of build failuresChristopher Baines2020-11-30
|
* Improve agent loggingChristopher Baines2020-11-30
| | | | | Use a logger, and set out different levels. Also try and neaten up the formatting.
* Avoid output from has-substitutes?Christopher Baines2020-11-29
|
* Avoid lots of output when fetching substitutes for inputsChristopher Baines2020-11-29
|
* Wait if no new builds are availableChristopher Baines2020-11-29
|
* Fix the post-build-failure procedure to handle agent errorsChristopher Baines2020-11-29
| | | | | Spot when the coordinator says the build is already processed, and don't raise an exception.
* Tune agent sleepingChristopher Baines2020-11-29
| | | | I don't think there's a need for the agent to sleep much.
* Better handle fetching buildsChristopher Baines2020-11-27
| | | | | | | | | | | | Previously, an agent could end up fetching builds from the coordinator, but not receiving the response, say because of a network issue or timeout. When it retries, it would fetch even more builds, and there would be some allocated to it, but that it doesn't know about. These changes attempt to make fetching builds more idempotent, rather than returning the new allocated builds, it returns all the builds, and rather than requesting a number of new builds, it's the total number of allocated builds that is specified.
* Avoid agent crashes when substitute urls aren't providedChristopher Baines2020-11-16
|
* Attempt to more gracefully handle the problem of missing derivationsChristopher Baines2020-11-02
| | | | In the agent and allocator.
* Improve missing inputs behaviourChristopher Baines2020-10-24
| | | | | | | | When a substitute is found for a direct input, but it can't be fetched, this is probably because something it referenced isn't available. Therefore, look through the references recursively and collect up the store items that aren't available locally or via a substitute. Send this list to the coordinator so that it can schedule builds.
* Add missing newline to failed to fetch substitutes messageChristopher Baines2020-10-24
|
* Use valid-path? rather than file exists for testing store itemsChristopher Baines2020-10-24
| | | | | As the file might exist, but ignored because the daemon is treating it as invalid.
* Have the agent handle errors from the coordinatorChristopher Baines2020-10-24
| | | | | When submitting builds. The agent will now retry the relevant thing, like uploading the log file if the coordinator says that still needs doing.
* Extract out agents submitting log filesChristopher Baines2020-10-24
| | | | So that this code can be retried if submitting the build result fails.
* Change how agents handle store connectionsChristopher Baines2020-08-26
| | | | | Keep a connection open for longer, to allow for doing things like registering gc roots.
* Use valid-path? rather than file-exists?Christopher Baines2020-08-15
| | | | | Because items can be in the store but not be valid. This should help with issues where the build can't start, but all the items show up in the store.
* Support tracking the end time of buildsChristopher Baines2020-07-01
|
* Support storing when builds startChristopher Baines2020-07-01
| | | | | | | This isn't particularly accurate, what's actually being stored is the current time when the record is inserted in to the coordinator database, but that should happen just before the agent starts the build, so hopefully that's good enough.
* Handle the system more explicitly when fetching buildsChristopher Baines2020-06-19
| | | | Also support fetching builds for specific systems from the Guix Data Service.
* Add a timeout when fetching build inputsChristopher Baines2020-06-19
| | | | As this seems like it can hang.
* Improve the job processing outputChristopher Baines2020-05-24
| | | | | Prefix more output with the build id, which helps when multiple builds are happening in parallel.
* Increase waiting time for log filesChristopher Baines2020-05-24
|
* Handle log files not being immediately availableChristopher Baines2020-05-23
| | | | | I think the daemon might take some time to produce them, so retry finding the log file.
* Fix some variable namingChristopher Baines2020-05-22
|
* Make sure to count all the jobsChristopher Baines2020-05-21
| | | | Otherwise jobs could be fetched needlessly.
* Stop using futures for running buildsChristopher Baines2020-05-21
| | | | | When you have 1 core, futures doesn't fit this use case, as it only creates one thread.
* Guard against garbage collecting in multiple threadsChristopher Baines2020-05-20
| | | | As the same time, I think I've seen issues with deleting links.
* Switch to using threads for running builds in parallelChristopher Baines2020-05-20
| | | | | | primative-fork in Guile seems more trouble than its worth, the parent process seemed to lock up frequently. I think using threads could be causing problems with TLS, but at least it doesn't lock up completely.
* Don't check for child processes in the agent if there are noneChristopher Baines2020-05-18
| | | | Or at least there shouldn't be.
* Move sleeping when there are no new builds availableChristopher Baines2020-05-18
|
* Sleep for 30 seconds if no new builds are availableChristopher Baines2020-05-18
|