guix/build-coordinator

	Commit message (Collapse)	Author	Age
*	Move retry in submit-output	Christopher Baines	2022-04-09
\|
*	Add a timeout for submitting outputs	Christopher Baines	2022-04-09
\|
*	Only use GC protection when gnutls won't internally retry	Christopher Baines	2022-02-04
\|
*	Fix the threshold for metrics delay logging	Christopher Baines	2022-02-01
\|
*	Tweak metrics delay logging	Christopher Baines	2022-01-20
\| \| \| \| \|	Just instrument the update-managed-metrics! function, and move some code around so this is clearer in the logs.
*	Remove bodies from responses to HEAD requests	Christopher Baines	2021-12-27
\|
*	Fix route for getting the bytes uploaded for an output	Christopher Baines	2021-12-24
\|
*	Improve still more to send log message	Christopher Baines	2021-12-22
\|
*	Improve logging when submitting outputs	Christopher Baines	2021-12-22
\|
*	Check before deleting files	Christopher Baines	2021-11-26
\| \| \| \|	As I've seen exceptions here.
*	Remove redundant if in the controller	Christopher Baines	2021-11-26
\|
*	Don't print backtraces in the controller when chunked inputs end	Christopher Baines	2021-11-26
\|
*	Delete existing files when processing upload requests	Christopher Baines	2021-11-22
\| \| \| \|	I think this will help when handling new requests after failed ones.
*	Unwind on some exceptions	Christopher Baines	2021-11-22
\| \| \| \|	The error handling here should be handling by unwinding.
*	Improve some way numbers are displayed	Christopher Baines	2021-11-22
\|
*	Only check the size of the file once when uploading	Christopher Baines	2021-11-21
\|
*	Fix variable reference in submit-output	Christopher Baines	2021-11-21
\|
*	Compress outputs outside of the upload slot	Christopher Baines	2021-11-20
\| \| \| \| \|	So that the only thing taking place in the upload slot, is the actual upload, which should improve throughput.
*	Track delays for reporting metrics	Christopher Baines	2021-11-16
\|
*	Check if an output has been uploaded before trying to upload it	Christopher Baines	2021-11-16
\| \| \| \| \| \| \| \| \| \|	This can help if the output has been uploaded, but the hash isn't present, since trying to submit the build result will prompt for the output to be sent again, but it doesn't need to be, the agent just needs to wait. This is a little inelegant, maybe there needs to be some way for the agent to explicitly check for the hash to be computed, but I'm hoping these changes will help with uploading large outputs.
*	Add error handling around computing output hashes	Christopher Baines	2021-11-15
\| \| \| \|	As I've seen decompression errors.
*	Fix the uri when calling coordinator-handle-failed-request	Christopher Baines	2021-11-15
\|
*	Handle the case where there are no more bytes to send	Christopher Baines	2021-11-15
\| \| \| \| \| \| \| \| \|	When submitting an output. This also fixes a regression in not passing report-bytes-sent on to call-with-streaming-http-request. I think this case where the agent is trying to send 0 bytes to the coordinator can happen when the last request to the coordinator times out, probably due to the computing of the hash taking so long.
*	Remove some test code	Christopher Baines	2021-11-14
\|
*	Implement initial support for resuming HTTP uploads	Christopher Baines	2021-11-14
\| \| \| \| \|	This means agents reattempting uploads don't have to start from scratch, and can instead pick up from what's already been uploaded to the coordinator.
*	Don't error for 404 responses in coordinator-http-request	Christopher Baines	2021-11-14
\|
*	Don't error for responses with no body in coordinator-http-request	Christopher Baines	2021-11-14
\|
*	Start checking the hashes of submitted outputs	Christopher Baines	2021-11-14
\| \| \| \| \| \| \| \| \| \|	This provides some extra safety on top of the guarantees from TCP around the integrity of the data received. I'm introducing this now in preparation for supporting resuming partial uploads. Because this will add some extra complexity around receiving uploads, this extra check should ensure that issues with the implementation cannot lead to corrupt uploads.
*	Return to compressing outputs then sending them	Christopher Baines	2021-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to avoid the GnuTLS bindings breaking when the garbage collector runs is quite difficult, and the current approach isn't very effective. I want to try instead to support resuming partial uploads, as that should both help with the GnuTLS GC issue, as well as network interruptions in general. I think that approach is going to be easier with compressing the files up-front, so revert to doing that. This partially reverts commit 8258e9c8d9f729b2670a602c523c59847b676b1a.
*	Move retrying uploads out of the with-upload-slot region	Christopher Baines	2021-08-07
\| \| \| \| \|	Such that the retry happens with a fresh slot (and the associated tracking information).
*	Support reporting bytes sent when submitting outputs	Christopher Baines	2021-06-08
\|
*	Retry more when sending outputs	Christopher Baines	2021-05-30
\| \| \| \| \|	Since time has been spent building them, so wait longer before giving up submitting the outputs.
*	Improve the reveived output message	Christopher Baines	2021-05-30
\|
*	Further tweak sending chunked HTTP requests	Christopher Baines	2021-05-29
\| \| \| \| \| \| \| \|	Don't compress then send, since I think compression can be slower than sending, so doing both at the same time is probably faster. Add make-chunked-output-port* which might be more efficient than the Guile chunked output port, will disable garbage collection to avoid issues with GnuTLS and will try to force the garbage collector to run if there's garbage building up.
*	Add a space in coordinator-handle-failed-request	Christopher Baines	2021-05-28
\|
*	Use GC protection for normal requests to the coordinator as well	Christopher Baines	2021-05-28
\| \| \| \| \|	Since the gc breaking gnutls problem can occur for these requests probably as well.
*	Increase the buffer size for sending outputs and log files	Christopher Baines	2021-05-28
\| \| \| \|	I think this works better.
*	Get rid of the request mutex	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	This was put in to try and prevent the crashes inside gnutls, but was ineffective since the actual trigger for the issues is garbage collection, rather than parallel requests. There might be some benefit from limiting request parallelism in the future, but that can be thought through then.
*	Tune sending files over HTTP	Christopher Baines	2021-05-28
\| \| \| \| \| \| \| \| \|	Guile's garbage collector interferes with Guile+gnutls, which means that sending files while the garbage collector is active is difficult. These changes try to work around this by disabling the garbage collector just as the data is being written, then enabling it again. I think this helps to work around the issue.
*	Reduce the threshold for compressing nars on the fly	Christopher Baines	2021-05-26
\| \| \| \| \|	Prefer upfront compression, as this might reduce GC activity while sending the data.
*	Remove stale log files	Christopher Baines	2021-05-26
\|
*	Drop the request mutex for most requests	Christopher Baines	2021-05-21
\| \| \| \|	Just use it when uploading files.
*	Use a bigger buffer when uploading logs	Christopher Baines	2021-05-13
\| \| \| \|	As I think this might make it faster.
*	Handle receiving outputs as a bytevector	Christopher Baines	2021-04-23
\| \| \| \|	This can happen if the request doesn't arrive in chunks.
*	Handle receiving logs as bytevectors	Christopher Baines	2021-04-09
\| \| \| \|	I think this can happen if the log doesn't arrive as a chunked HTTP request.
*	Add Guile GC related metrics	Christopher Baines	2021-03-25
\| \| \| \| \|	I'm seeing mmap(PROT_NONE) failed crashes, and maybe these metrics will help in understanding what's going on.
*	Add a new dynamic authentication approach	Christopher Baines	2021-02-28
\| \| \| \| \| \|	This avoids the need to create agents upfront, which could be useful when creating many childhurd VMs or using scheduling tools to dynamically run agents.
*	Avoid some threads and locks when running on the hurd	Christopher Baines	2021-02-15
\| \| \| \|	I've see the process hang on the hurd, and I think this might help.
*	Remove unused coordinator module from the http agent messaging module	Christopher Baines	2021-02-13
\|
*	Remove (guix-build-coordinator datastore) import from agent module	Christopher Baines	2021-02-13
\| \| \| \|	I'm seeing this pull in sqlite3 unnecessarily on the hurd.