aboutsummaryrefslogtreecommitdiff
path: root/guix-build-coordinator/agent-messaging
Commit message (Collapse)AuthorAge
* Move some metrics out of base-datastore-metrics-updaterChristopher Baines2020-12-04
| | | | | | | Some parts of this were quite slow with anything other than a small database, so instead of doing slow queries on every request, do some slow queries to setup the metrics, and then change them as part of the regular changes to the database.
* Add metrics for the database and WAL sizeChristopher Baines2020-12-01
| | | | | I particularly want to monitor the WAL growth, as I don't think SQLite's usual approach to keeping the size down is sufficient.
* Replace WARNING with WARNChristopher Baines2020-11-30
| | | | As it's shorter, and this keeps the logging neat.
* Revert erroneous logging changeChristopher Baines2020-11-30
|
* Improve the logging from the agent -> coordinator communicationChristopher Baines2020-11-30
|
* Better handle fetching buildsChristopher Baines2020-11-27
| | | | | | | | | | | | Previously, an agent could end up fetching builds from the coordinator, but not receiving the response, say because of a network issue or timeout. When it retries, it would fetch even more builds, and there would be some allocated to it, but that it doesn't know about. These changes attempt to make fetching builds more idempotent, rather than returning the new allocated builds, it returns all the builds, and rather than requesting a number of new builds, it's the total number of allocated builds that is specified.
* Make hook processing a bit more efficientChristopher Baines2020-11-09
| | | | | Rather than polling the database every second, use some condition variables to wake threads when there's probably an event.
* Use the build coordinator logger in the agent messaging serverChristopher Baines2020-11-07
|
* Include the Guile internal real and run times as metricsChristopher Baines2020-11-02
| | | | This will help track CPU time, as well as restarts/crashes.
* Have the agent handle errors from the coordinatorChristopher Baines2020-10-24
| | | | | When submitting builds. The agent will now retry the relevant thing, like uploading the log file if the coordinator says that still needs doing.
* Better handle agent errors on the coordinator sideChristopher Baines2020-10-24
| | | | | Things like the agent not having the log file, or an output. This will allow the agent to actually retry the relevant thing.
* Improve the line length for the receiving outputs codeChristopher Baines2020-10-24
|
* Move around the code for build log file locationsChristopher Baines2020-10-11
| | | | | | build-log-file-location replaces build-log-file-exists? as it doesn't always return a boolean, it also changes to return an absolute filepath for the log file if it exists, as this will be easier to use.
* Guard against receiving parts of build log filesChristopher Baines2020-10-10
|
* Fix missing bad-request procedureChristopher Baines2020-10-07
|
* Separate the agent messaging server and client codeChristopher Baines2020-10-07
| | | | So that the client part doesn't depend on fibers.
* Split the fibers utils from the main utils moduleChristopher Baines2020-10-07
| | | | | To start making it possible to use the agent, without having to load anything related to fibers (as it doesn't work on the hurd yet).
* Don't patch fibers, just use the different procedure directlyChristopher Baines2020-09-16
|
* Use the #:namespace argument for metric registriesChristopher Baines2020-08-31
|
* Use the guile-prometheus library for the metricsChristopher Baines2020-08-31
| | | | Which was extracted from the Guix Build Coordinator.
* Switch to using guile-lzlibChristopher Baines2020-08-31
| | | | Rather than the lzlib module within Guix.
* Support storing when builds startChristopher Baines2020-07-01
| | | | | | | This isn't particularly accurate, what's actually being stored is the current time when the record is inserted in to the coordinator database, but that should happen just before the agent starts the build, so hopefully that's good enough.
* Report builds by derivation systemChristopher Baines2020-06-19
|
* Handle the system more explicitly when fetching buildsChristopher Baines2020-06-19
| | | | Also support fetching builds for specific systems from the Guix Data Service.
* Add some additional logging around the uploading of build logsChristopher Baines2020-05-23
| | | | As there seems to be some failures in this area.
* Track unprocessed hook events by eventChristopher Baines2020-05-23
|
* Report the number of unprocessed hook eventsChristopher Baines2020-05-21
|
* Try to better handle/avoid http related failuresChristopher Baines2020-05-20
| | | | | | | | | | | I'm seeing "Resource temporarily unavailable, try again" errors from GnuTLS, mostly around the file uploads I think. I'm not sure what's going on here, but it seems to happen when using multiple threads in parallel. Anyway, this commit uses some mutexes to avoid uploading files in parallel, and also improves error handling generally. I'm pretty sure this isn't sufficient to fix the issue, but I could be looking in completely the wrong place for the problem.
* Fix zeroing the right metricChristopher Baines2020-05-19
|
* Zero the allocated build counts as wellChristopher Baines2020-05-19
|
* Zero the allocated build counts for each agentChristopher Baines2020-05-19
| | | | Otherwise old values persist if an agent has no allocated builds.
* Support agents processing builds in parallelChristopher Baines2020-05-17
|
* Change how triggering build allocations worksChristopher Baines2020-05-17
| | | | | Associate this with the coordinator, rather than having the logic in the agent communication code.
* Open up more fibers possibilities in the coordinatorChristopher Baines2020-05-17
| | | | | | | | | | | | | | | | | | | I'm looking to listen for client instructions ("build this", ...) maybe on a UNIX socket, which looks to be possible with fibers, but doing this at the same time as using a network socket for agent messaging requires more access than run-server from the fibers web server module currently allows. To get around this, patch the fibers web server run-server procedure to do less, and do that instead in the guix-build-coordinator. This is somewhat similar to what I think Cuirass does to allow it to do more with fibers. This required messing with the current-fiber parameter in a couple more places around threads, I'm not really sure why that problem has occurred now. This current-fiber parameter issue should be resolved in the next fibers release. One good thing with these changes is some behaviours not related to agent communication, like triggering build allocation on startup have been moved out of the agent communication code.
* Guard against failed file uploadsChristopher Baines2020-05-13
| | | | Only move the file in to the destination location when the upload completes.
* Stop base64 encoding chunked requestsChristopher Baines2020-05-09
| | | | | I'm not sure why I did this... but it's slower and more complex than just not base64 encoding the data.
* Improve the output receiving informational messagesChristopher Baines2020-05-09
|
* Tweak how outputs are sentChristopher Baines2020-05-09
| | | | | Add time logging, increase the buffer size for dump-file, and increase the retry times.
* Make it possible to see how fast outputs are transferedChristopher Baines2020-05-09
|
* Add some informative messages when sending outputsChristopher Baines2020-05-09
|
* Decrease data limit for sending without upfront compressionChristopher Baines2020-05-09
| | | | This should reduce the request durations, and makes retrying slightly easier.
* Display some context around coordinator errors in the outputChristopher Baines2020-05-09
|
* Increase the number of threads for reading chunked responsesChristopher Baines2020-05-09
| | | | 4 might result in contention.
* Have the proc when reading logs from agents return #tChristopher Baines2020-05-09
| | | | This is probably good practice.
* Remove custom error handling for build resultsChristopher Baines2020-05-09
| | | | Just use the standard error handling in the controller.
* Make a record type for the build coordinatorChristopher Baines2020-05-08
| | | | | This is already useful to pass around the datastore, hooks and metrics registry, and will become more useful to pass around the allocator to use.
* Switch to a database table rather than a channel for hook eventsChristopher Baines2020-05-08
| | | | | | | The channel caused problems, as it would potentially block processing requests if the hook event couldn't be processed immediately. This approach should avoid that, as well as providing more reliability because the events are stored in the database.
* Treat access denied retry responses as successfulChristopher Baines2020-05-08
| | | | For reporting setup failures.
* Support handling access denied responses as successesChristopher Baines2020-05-08
| | | | | | | | This will be useful for submitting setup failures. Once the message is processed successfully, the agent should no longer have access to update that build. If the request is processed successfully, but the response isn't received, when the agent retries, it'll get an access denied response. In this scenario, that means success, so this will allow it to be treated as such.
* Make the access denied responses actual JSON objectsChristopher Baines2020-05-08
| | | | This is clearer.