Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Fix issue with the change of the load average period | Christopher Baines | 2020-12-23 |
| | |||
* | Switch to using the 1 minute load average | Christopher Baines | 2020-12-23 |
| | | | | So that it's more responsive. | ||
* | Avoid starting builds if the system load is high | Christopher Baines | 2020-12-23 |
| | |||
* | Improve the guix-daemon claims a substitute is unavailable messages | Christopher Baines | 2020-12-23 |
| | | | | Include the substitute servers that should be providing the substitute. | ||
* | Implement build cancelation | Christopher Baines | 2020-12-16 |
| | |||
* | Guard against a weird state for missing build inputs | Christopher Baines | 2020-12-15 |
| | | | | | | Where there are missing files, but find-missing-substitutes-for-output doesn't return anything. I think this can happen when the substitutes should be available, but there was an error when fetching them. | ||
* | Add missing lgr argument | Christopher Baines | 2020-12-10 |
| | |||
* | Start using Prometheus metrics with the agent | Christopher Baines | 2020-12-07 |
| | | | | | Rather than having the agent run a webserver, use the textfile collector from the node exporter. | ||
* | Avoid setting timeout options for the daemon for the build | Christopher Baines | 2020-12-06 |
| | | | | | The timeouts are useful when fetching substitutes, but I want to keep the previous behaviour of using the values set in the daemon for the build itself. | ||
* | Print out when builds fail due to timeouts | Christopher Baines | 2020-12-06 |
| | |||
* | Add some randomisation to substitute delays | Christopher Baines | 2020-12-05 |
| | |||
* | Add in timeouts around fetching substitutes | Christopher Baines | 2020-12-05 |
| | | | | As I think this sometimes hangs. | ||
* | Fix agent confusion over how many builds are running | Christopher Baines | 2020-12-04 |
| | | | | | The previous code was less than ideal, this simpler and avoids less messy state. | ||
* | Replace WARNING with WARN | Christopher Baines | 2020-11-30 |
| | | | | As it's shorter, and this keeps the logging neat. | ||
* | Add some useful logging when the agent starts | Christopher Baines | 2020-11-30 |
| | |||
* | Improve logging of new builds | Christopher Baines | 2020-11-30 |
| | |||
* | Hide logging from garbage collection | Christopher Baines | 2020-11-30 |
| | |||
* | Improve the logging from the agent -> coordinator communication | Christopher Baines | 2020-11-30 |
| | |||
* | Improve handling of build failures | Christopher Baines | 2020-11-30 |
| | |||
* | Improve agent logging | Christopher Baines | 2020-11-30 |
| | | | | | Use a logger, and set out different levels. Also try and neaten up the formatting. | ||
* | Avoid output from has-substitutes? | Christopher Baines | 2020-11-29 |
| | |||
* | Avoid lots of output when fetching substitutes for inputs | Christopher Baines | 2020-11-29 |
| | |||
* | Wait if no new builds are available | Christopher Baines | 2020-11-29 |
| | |||
* | Fix the post-build-failure procedure to handle agent errors | Christopher Baines | 2020-11-29 |
| | | | | | Spot when the coordinator says the build is already processed, and don't raise an exception. | ||
* | Tune agent sleeping | Christopher Baines | 2020-11-29 |
| | | | | I don't think there's a need for the agent to sleep much. | ||
* | Better handle fetching builds | Christopher Baines | 2020-11-27 |
| | | | | | | | | | | | | Previously, an agent could end up fetching builds from the coordinator, but not receiving the response, say because of a network issue or timeout. When it retries, it would fetch even more builds, and there would be some allocated to it, but that it doesn't know about. These changes attempt to make fetching builds more idempotent, rather than returning the new allocated builds, it returns all the builds, and rather than requesting a number of new builds, it's the total number of allocated builds that is specified. | ||
* | Avoid agent crashes when substitute urls aren't provided | Christopher Baines | 2020-11-16 |
| | |||
* | Attempt to more gracefully handle the problem of missing derivations | Christopher Baines | 2020-11-02 |
| | | | | In the agent and allocator. | ||
* | Improve missing inputs behaviour | Christopher Baines | 2020-10-24 |
| | | | | | | | | When a substitute is found for a direct input, but it can't be fetched, this is probably because something it referenced isn't available. Therefore, look through the references recursively and collect up the store items that aren't available locally or via a substitute. Send this list to the coordinator so that it can schedule builds. | ||
* | Add missing newline to failed to fetch substitutes message | Christopher Baines | 2020-10-24 |
| | |||
* | Use valid-path? rather than file exists for testing store items | Christopher Baines | 2020-10-24 |
| | | | | | As the file might exist, but ignored because the daemon is treating it as invalid. | ||
* | Have the agent handle errors from the coordinator | Christopher Baines | 2020-10-24 |
| | | | | | When submitting builds. The agent will now retry the relevant thing, like uploading the log file if the coordinator says that still needs doing. | ||
* | Extract out agents submitting log files | Christopher Baines | 2020-10-24 |
| | | | | So that this code can be retried if submitting the build result fails. | ||
* | Change how agents handle store connections | Christopher Baines | 2020-08-26 |
| | | | | | Keep a connection open for longer, to allow for doing things like registering gc roots. | ||
* | Use valid-path? rather than file-exists? | Christopher Baines | 2020-08-15 |
| | | | | | Because items can be in the store but not be valid. This should help with issues where the build can't start, but all the items show up in the store. | ||
* | Support tracking the end time of builds | Christopher Baines | 2020-07-01 |
| | |||
* | Support storing when builds start | Christopher Baines | 2020-07-01 |
| | | | | | | | This isn't particularly accurate, what's actually being stored is the current time when the record is inserted in to the coordinator database, but that should happen just before the agent starts the build, so hopefully that's good enough. | ||
* | Handle the system more explicitly when fetching builds | Christopher Baines | 2020-06-19 |
| | | | | Also support fetching builds for specific systems from the Guix Data Service. | ||
* | Add a timeout when fetching build inputs | Christopher Baines | 2020-06-19 |
| | | | | As this seems like it can hang. | ||
* | Improve the job processing output | Christopher Baines | 2020-05-24 |
| | | | | | Prefix more output with the build id, which helps when multiple builds are happening in parallel. | ||
* | Increase waiting time for log files | Christopher Baines | 2020-05-24 |
| | |||
* | Handle log files not being immediately available | Christopher Baines | 2020-05-23 |
| | | | | | I think the daemon might take some time to produce them, so retry finding the log file. | ||
* | Fix some variable naming | Christopher Baines | 2020-05-22 |
| | |||
* | Make sure to count all the jobs | Christopher Baines | 2020-05-21 |
| | | | | Otherwise jobs could be fetched needlessly. | ||
* | Stop using futures for running builds | Christopher Baines | 2020-05-21 |
| | | | | | When you have 1 core, futures doesn't fit this use case, as it only creates one thread. | ||
* | Guard against garbage collecting in multiple threads | Christopher Baines | 2020-05-20 |
| | | | | As the same time, I think I've seen issues with deleting links. | ||
* | Switch to using threads for running builds in parallel | Christopher Baines | 2020-05-20 |
| | | | | | | primative-fork in Guile seems more trouble than its worth, the parent process seemed to lock up frequently. I think using threads could be causing problems with TLS, but at least it doesn't lock up completely. | ||
* | Don't check for child processes in the agent if there are none | Christopher Baines | 2020-05-18 |
| | | | | Or at least there shouldn't be. | ||
* | Move sleeping when there are no new builds available | Christopher Baines | 2020-05-18 |
| | |||
* | Sleep for 30 seconds if no new builds are available | Christopher Baines | 2020-05-18 |
| |