guix/build-coordinator

	Commit message (Collapse)	Author	Age
*	Retry more when sending outputs	Christopher Baines	2021-05-30
\| \| \| \| \|	Since time has been spent building them, so wait longer before giving up submitting the outputs.
*	Further tweak sending chunked HTTP requests	Christopher Baines	2021-05-29
\| \| \| \| \| \| \| \|	Don't compress then send, since I think compression can be slower than sending, so doing both at the same time is probably faster. Add make-chunked-output-port* which might be more efficient than the Guile chunked output port, will disable garbage collection to avoid issues with GnuTLS and will try to force the garbage collector to run if there's garbage building up.
*	Add a space in coordinator-handle-failed-request	Christopher Baines	2021-05-28
\|
*	Use GC protection for normal requests to the coordinator as well	Christopher Baines	2021-05-28
\| \| \| \| \|	Since the gc breaking gnutls problem can occur for these requests probably as well.
*	Increase the buffer size for sending outputs and log files	Christopher Baines	2021-05-28
\| \| \| \|	I think this works better.
*	Get rid of the request mutex	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	This was put in to try and prevent the crashes inside gnutls, but was ineffective since the actual trigger for the issues is garbage collection, rather than parallel requests. There might be some benefit from limiting request parallelism in the future, but that can be thought through then.
*	Tune sending files over HTTP	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	Guile's garbage collector interferes with Guile+gnutls, which means that sending files while the garbage collector is active is difficult. These changes try to work around this by disabling the garbage collector just as the data is being written, then enabling it again. I think this helps to work around the issue.
*	Reduce the threshold for compressing nars on the fly	Christopher Baines	2021-05-26
\| \| \| \| \|	Prefer upfront compression, as this might reduce GC activity while sending the data.
*	Drop the request mutex for most requests	Christopher Baines	2021-05-21
\| \| \| \|	Just use it when uploading files.
*	Use a bigger buffer when uploading logs	Christopher Baines	2021-05-13
\| \| \| \|	As I think this might make it faster.
*	Add a new dynamic authentication approach	Christopher Baines	2021-02-28
\| \| \| \| \| \|	This avoids the need to create agents upfront, which could be useful when creating many childhurd VMs or using scheduling tools to dynamically run agents.
*	Avoid some threads and locks when running on the hurd	Christopher Baines	2021-02-15
\| \| \| \|	I've see the process hang on the hurd, and I think this might help.
*	Remove unused coordinator module from the http agent messaging module	Christopher Baines	2021-02-13
\|
*	Remove (guix-build-coordinator datastore) import from agent module	Christopher Baines	2021-02-13
\| \| \| \|	I'm seeing this pull in sqlite3 unnecessarily on the hurd.
*	Rework the agent messaging modules	Christopher Baines	2021-01-15
\|
*	Use methods for the agent messaging	Christopher Baines	2021-01-15
\| \| \| \|	This will allow adding more agent messaging approaches.
*	Tune agent retrying	Christopher Baines	2021-01-01
\| \| \| \|	So that the agent spends less time waiting.
*	Replace WARNING with WARN	Christopher Baines	2020-11-30
\| \| \| \|	As it's shorter, and this keeps the logging neat.
*	Revert erroneous logging change	Christopher Baines	2020-11-30
\|
*	Improve the logging from the agent -> coordinator communication	Christopher Baines	2020-11-30
\|
*	Better handle fetching builds	Christopher Baines	2020-11-27
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, an agent could end up fetching builds from the coordinator, but not receiving the response, say because of a network issue or timeout. When it retries, it would fetch even more builds, and there would be some allocated to it, but that it doesn't know about. These changes attempt to make fetching builds more idempotent, rather than returning the new allocated builds, it returns all the builds, and rather than requesting a number of new builds, it's the total number of allocated builds that is specified.
*	Have the agent handle errors from the coordinator	Christopher Baines	2020-10-24
\| \| \| \| \|	When submitting builds. The agent will now retry the relevant thing, like uploading the log file if the coordinator says that still needs doing.
*	Separate the agent messaging server and client code	Christopher Baines	2020-10-07
\| \| \| \|	So that the client part doesn't depend on fibers.
*	Split the fibers utils from the main utils module	Christopher Baines	2020-10-07
\| \| \| \| \|	To start making it possible to use the agent, without having to load anything related to fibers (as it doesn't work on the hurd yet).
*	Don't patch fibers, just use the different procedure directly	Christopher Baines	2020-09-16
\|
*	Use the #:namespace argument for metric registries	Christopher Baines	2020-08-31
\|
*	Use the guile-prometheus library for the metrics	Christopher Baines	2020-08-31
\| \| \| \|	Which was extracted from the Guix Build Coordinator.
*	Switch to using guile-lzlib	Christopher Baines	2020-08-31
\| \| \| \|	Rather than the lzlib module within Guix.
*	Support storing when builds start	Christopher Baines	2020-07-01
\| \| \| \| \| \| \|	This isn't particularly accurate, what's actually being stored is the current time when the record is inserted in to the coordinator database, but that should happen just before the agent starts the build, so hopefully that's good enough.
*	Report builds by derivation system	Christopher Baines	2020-06-19
\|
*	Handle the system more explicitly when fetching builds	Christopher Baines	2020-06-19
\| \| \| \|	Also support fetching builds for specific systems from the Guix Data Service.
*	Add some additional logging around the uploading of build logs	Christopher Baines	2020-05-23
\| \| \| \|	As there seems to be some failures in this area.
*	Track unprocessed hook events by event	Christopher Baines	2020-05-23
\|
*	Report the number of unprocessed hook events	Christopher Baines	2020-05-21
\|
*	Try to better handle/avoid http related failures	Christopher Baines	2020-05-20
\| \| \| \| \| \| \| \| \| \| \|	I'm seeing "Resource temporarily unavailable, try again" errors from GnuTLS, mostly around the file uploads I think. I'm not sure what's going on here, but it seems to happen when using multiple threads in parallel. Anyway, this commit uses some mutexes to avoid uploading files in parallel, and also improves error handling generally. I'm pretty sure this isn't sufficient to fix the issue, but I could be looking in completely the wrong place for the problem.
*	Fix zeroing the right metric	Christopher Baines	2020-05-19
\|
*	Zero the allocated build counts as well	Christopher Baines	2020-05-19
\|
*	Zero the allocated build counts for each agent	Christopher Baines	2020-05-19
\| \| \| \|	Otherwise old values persist if an agent has no allocated builds.
*	Support agents processing builds in parallel	Christopher Baines	2020-05-17
\|
*	Change how triggering build allocations works	Christopher Baines	2020-05-17
\| \| \| \| \|	Associate this with the coordinator, rather than having the logic in the agent communication code.
*	Open up more fibers possibilities in the coordinator	Christopher Baines	2020-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm looking to listen for client instructions ("build this", ...) maybe on a UNIX socket, which looks to be possible with fibers, but doing this at the same time as using a network socket for agent messaging requires more access than run-server from the fibers web server module currently allows. To get around this, patch the fibers web server run-server procedure to do less, and do that instead in the guix-build-coordinator. This is somewhat similar to what I think Cuirass does to allow it to do more with fibers. This required messing with the current-fiber parameter in a couple more places around threads, I'm not really sure why that problem has occurred now. This current-fiber parameter issue should be resolved in the next fibers release. One good thing with these changes is some behaviours not related to agent communication, like triggering build allocation on startup have been moved out of the agent communication code.
*	Guard against failed file uploads	Christopher Baines	2020-05-13
\| \| \| \|	Only move the file in to the destination location when the upload completes.
*	Stop base64 encoding chunked requests	Christopher Baines	2020-05-09
\| \| \| \| \|	I'm not sure why I did this... but it's slower and more complex than just not base64 encoding the data.
*	Improve the output receiving informational messages	Christopher Baines	2020-05-09
\|
*	Tweak how outputs are sent	Christopher Baines	2020-05-09
\| \| \| \| \|	Add time logging, increase the buffer size for dump-file, and increase the retry times.
*	Make it possible to see how fast outputs are transfered	Christopher Baines	2020-05-09
\|
*	Add some informative messages when sending outputs	Christopher Baines	2020-05-09
\|
*	Decrease data limit for sending without upfront compression	Christopher Baines	2020-05-09
\| \| \| \|	This should reduce the request durations, and makes retrying slightly easier.
*	Display some context around coordinator errors in the output	Christopher Baines	2020-05-09
\|
*	Increase the number of threads for reading chunked responses	Christopher Baines	2020-05-09
\| \| \| \|	4 might result in contention.