guix/build-coordinator

	Commit message (Collapse)	Author	Age
*	Move retrying uploads out of the with-upload-slot region	Christopher Baines	2021-08-07
\| \| \| \| \|	Such that the retry happens with a fresh slot (and the associated tracking information).
*	Support reporting bytes sent when submitting outputs	Christopher Baines	2021-06-08
\|
*	Retry more when sending outputs	Christopher Baines	2021-05-30
\| \| \| \| \|	Since time has been spent building them, so wait longer before giving up submitting the outputs.
*	Improve the reveived output message	Christopher Baines	2021-05-30
\|
*	Further tweak sending chunked HTTP requests	Christopher Baines	2021-05-29
\| \| \| \| \| \| \| \|	Don't compress then send, since I think compression can be slower than sending, so doing both at the same time is probably faster. Add make-chunked-output-port* which might be more efficient than the Guile chunked output port, will disable garbage collection to avoid issues with GnuTLS and will try to force the garbage collector to run if there's garbage building up.
*	Add a space in coordinator-handle-failed-request	Christopher Baines	2021-05-28
\|
*	Use GC protection for normal requests to the coordinator as well	Christopher Baines	2021-05-28
\| \| \| \| \|	Since the gc breaking gnutls problem can occur for these requests probably as well.
*	Increase the buffer size for sending outputs and log files	Christopher Baines	2021-05-28
\| \| \| \|	I think this works better.
*	Get rid of the request mutex	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	This was put in to try and prevent the crashes inside gnutls, but was ineffective since the actual trigger for the issues is garbage collection, rather than parallel requests. There might be some benefit from limiting request parallelism in the future, but that can be thought through then.
*	Tune sending files over HTTP	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	Guile's garbage collector interferes with Guile+gnutls, which means that sending files while the garbage collector is active is difficult. These changes try to work around this by disabling the garbage collector just as the data is being written, then enabling it again. I think this helps to work around the issue.
*	Reduce the threshold for compressing nars on the fly	Christopher Baines	2021-05-26
\| \| \| \| \|	Prefer upfront compression, as this might reduce GC activity while sending the data.
*	Remove stale log files	Christopher Baines	2021-05-26
\|
*	Drop the request mutex for most requests	Christopher Baines	2021-05-21
\| \| \| \|	Just use it when uploading files.
*	Use a bigger buffer when uploading logs	Christopher Baines	2021-05-13
\| \| \| \|	As I think this might make it faster.
*	Handle receiving outputs as a bytevector	Christopher Baines	2021-04-23
\| \| \| \|	This can happen if the request doesn't arrive in chunks.
*	Handle receiving logs as bytevectors	Christopher Baines	2021-04-09
\| \| \| \|	I think this can happen if the log doesn't arrive as a chunked HTTP request.
*	Add Guile GC related metrics	Christopher Baines	2021-03-25
\| \| \| \| \|	I'm seeing mmap(PROT_NONE) failed crashes, and maybe these metrics will help in understanding what's going on.
*	Add a new dynamic authentication approach	Christopher Baines	2021-02-28
\| \| \| \| \| \|	This avoids the need to create agents upfront, which could be useful when creating many childhurd VMs or using scheduling tools to dynamically run agents.
*	Avoid some threads and locks when running on the hurd	Christopher Baines	2021-02-15
\| \| \| \|	I've see the process hang on the hurd, and I think this might help.
*	Remove unused coordinator module from the http agent messaging module	Christopher Baines	2021-02-13
\|
*	Remove (guix-build-coordinator datastore) import from agent module	Christopher Baines	2021-02-13
\| \| \| \|	I'm seeing this pull in sqlite3 unnecessarily on the hurd.
*	Don't use with-exception-handler with (backtrace)	Christopher Baines	2021-01-22
\| \| \| \| \| \| \| \| \| \| \|	With with-exception-handler being called with #:unwind? #f (implicitly). This breaks Guile internals used by (backtrace) [1], meaning you get a different exception/backtrace when Guile itself breaks. This should avoid the "string->number: Wrong type argument in position 1 (expecting string): #f" exception I've been haunted by for the last year. 1: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=46009
*	Add missing thunk	Christopher Baines	2021-01-16
\|
*	Tweak agent messaging error handling	Christopher Baines	2021-01-16
\|
*	Add local agent messaging	Christopher Baines	2021-01-16
\| \| \| \| \|	This is untested, but might be quite cool for running a single agent instance of the build coordinator, all in one process.
*	Move triggering allocations out of the http server	Christopher Baines	2021-01-15
\| \| \| \|	As this code should be in the coordinator.
*	Rework the agent messaging modules	Christopher Baines	2021-01-15
\|
*	Use methods for the agent messaging	Christopher Baines	2021-01-15
\| \| \| \|	This will allow adding more agent messaging approaches.
*	Tune agent retrying	Christopher Baines	2021-01-01
\| \| \| \|	So that the agent spends less time waiting.
*	Move some metrics out of base-datastore-metrics-updater	Christopher Baines	2020-12-04
\| \| \| \| \| \| \|	Some parts of this were quite slow with anything other than a small database, so instead of doing slow queries on every request, do some slow queries to setup the metrics, and then change them as part of the regular changes to the database.
*	Add metrics for the database and WAL size	Christopher Baines	2020-12-01
\| \| \| \| \|	I particularly want to monitor the WAL growth, as I don't think SQLite's usual approach to keeping the size down is sufficient.
*	Replace WARNING with WARN	Christopher Baines	2020-11-30
\| \| \| \|	As it's shorter, and this keeps the logging neat.
*	Revert erroneous logging change	Christopher Baines	2020-11-30
\|
*	Improve the logging from the agent -> coordinator communication	Christopher Baines	2020-11-30
\|
*	Better handle fetching builds	Christopher Baines	2020-11-27
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, an agent could end up fetching builds from the coordinator, but not receiving the response, say because of a network issue or timeout. When it retries, it would fetch even more builds, and there would be some allocated to it, but that it doesn't know about. These changes attempt to make fetching builds more idempotent, rather than returning the new allocated builds, it returns all the builds, and rather than requesting a number of new builds, it's the total number of allocated builds that is specified.
*	Make hook processing a bit more efficient	Christopher Baines	2020-11-09
\| \| \| \| \|	Rather than polling the database every second, use some condition variables to wake threads when there's probably an event.
*	Use the build coordinator logger in the agent messaging server	Christopher Baines	2020-11-07
\|
*	Include the Guile internal real and run times as metrics	Christopher Baines	2020-11-02
\| \| \| \|	This will help track CPU time, as well as restarts/crashes.
*	Have the agent handle errors from the coordinator	Christopher Baines	2020-10-24
\| \| \| \| \|	When submitting builds. The agent will now retry the relevant thing, like uploading the log file if the coordinator says that still needs doing.
*	Better handle agent errors on the coordinator side	Christopher Baines	2020-10-24
\| \| \| \| \|	Things like the agent not having the log file, or an output. This will allow the agent to actually retry the relevant thing.
*	Improve the line length for the receiving outputs code	Christopher Baines	2020-10-24
\|
*	Move around the code for build log file locations	Christopher Baines	2020-10-11
\| \| \| \| \| \|	build-log-file-location replaces build-log-file-exists? as it doesn't always return a boolean, it also changes to return an absolute filepath for the log file if it exists, as this will be easier to use.
*	Guard against receiving parts of build log files	Christopher Baines	2020-10-10
\|
*	Fix missing bad-request procedure	Christopher Baines	2020-10-07
\|
*	Separate the agent messaging server and client code	Christopher Baines	2020-10-07
\| \| \| \|	So that the client part doesn't depend on fibers.
*	Split the fibers utils from the main utils module	Christopher Baines	2020-10-07
\| \| \| \| \|	To start making it possible to use the agent, without having to load anything related to fibers (as it doesn't work on the hurd yet).
*	Don't patch fibers, just use the different procedure directly	Christopher Baines	2020-09-16
\|
*	Use the #:namespace argument for metric registries	Christopher Baines	2020-08-31
\|
*	Use the guile-prometheus library for the metrics	Christopher Baines	2020-08-31
\| \| \| \|	Which was extracted from the Guix Build Coordinator.
*	Switch to using guile-lzlib	Christopher Baines	2020-08-31
\| \| \| \|	Rather than the lzlib module within Guix.