aboutsummaryrefslogtreecommitdiff
path: root/guix-build-coordinator/agent-messaging
Commit message (Collapse)AuthorAge
* Don't use a chunked response for the metricsChristopher Baines2024-06-30
|
* Remove support for chunked requestsChristopher Baines2024-06-23
| | | | | This was a hack to work around reading the entire request/response body in to memory, and is no longer needed.
* Use the new process metrics exporterChristopher Baines2024-04-17
|
* Actually use non-blocking ports for network requestsChristopher Baines2024-03-02
| | | | In most places at least.
* Change retry-on-error to take #:ignore and #:no-retryChristopher Baines2024-01-12
| | | | And change #:ignore to better reflect ignoring the exception.
* Track the metrics endpoint durationChristopher Baines2023-11-17
|
* Try to avoid the metrics endpoint timing outChristopher Baines2023-11-10
| | | | As this makes it harder to debug issues.
* Exit when the server fails to startChristopher Baines2023-08-29
| | | | To avoid the process half working.
* Name more threadsChristopher Baines2023-08-09
| | | | To help with debugging.
* Try and get backtraces when current output port seems brokenChristopher Baines2023-06-04
| | | | | | | | | With the "conversion to port encoding failed" (#62590) error, I'm seeing the "error: when processing" logs, but the backtrace doesn't get logged, maybe because it's going to the current output port, which might be broken? Anyway, try sending the backtrace to the current error port, in the hope that this port is still working.
* Don't call report-bytes-hashed with #fChristopher Baines2023-05-24
| | | | Just log the line instead.
* Show backtraces for perform-upload errorsChristopher Baines2023-05-23
|
* Better handle the output hashing being completed for upload requestsChristopher Baines2023-05-23
| | | | | This currently causes an error on the server side and a timeout on the client side.
* Tweak logging around agent output submissionChristopher Baines2023-05-22
|
* Better handle upload requests with no contentChristopher Baines2023-05-22
| | | | | The idea with these is to allow the agent to resume waiting for the coordinator to finish computing the output hash.
* Guard against the stat call failingChristopher Baines2023-05-17
| | | | If the file doesn't exist.
* Add extra log line when hashing outputsChristopher Baines2023-05-14
|
* Not being able to log reliably is frustrating :(Christopher Baines2023-05-14
| | | | | | I'm seeing things like this, which I'm guessing relate to logging failing: 2023-05-14 13:40:44 (ERROR): exception in output hash thread: #<&compound-exception components: (#<&error> #<&origin origin: "put-char"> #<&message message: "conversion to port encoding failed"> #<&irritants irritants: 84> #<&exception-with-kind-and-args kind: encoding-error args: ("put-char" "conversion to port encoding failed" 84 #<output: file 1> #\2)>)>
* Guard against logging failures in the output hash threadChristopher Baines2023-05-14
|
* Set the name of the hash management threadChristopher Baines2023-05-14
| | | | For debugging purposes.
* Move some loggingChristopher Baines2023-05-14
| | | | So if it fails, it doesn't leave things in an inconsistent state.
* Add more logging around computing hashesChristopher Baines2023-05-14
|
* Fix missing logger argumentChristopher Baines2023-05-14
|
* Try to make starting hashing outputs more reliableChristopher Baines2023-05-12
| | | | Even if the connection to the agent has dropped when the upload has completed.
* Don't look at the content-length header for chunked transfersChristopher Baines2023-05-12
| | | | Since this isn't supposed to be set.
* Fix retry-times not always being setChristopher Baines2023-05-11
|
* Clean up some handling of uploads for agentsChristopher Baines2023-05-11
| | | | This commit should correct the progress reporting on partial uploads.
* Add more exception handling in the hash computing threadChristopher Baines2023-05-11
| | | | As I've seen log-msg here raise exceptions.
* Have the coordinator report on the outputs that are being hashedChristopher Baines2023-05-11
| | | | As this is useful to observe since it can take a long time for large outputs.
* Have agents report on the progress of the coordinator hashing outputsChristopher Baines2023-05-11
| | | | Otherwise it looks like the upload should finish, but hasn't.
* Don't log so much when the database is busyChristopher Baines2023-05-10
|
* Tweak retrying for status update requestsChristopher Baines2023-05-10
| | | | | Don't retry status updates many times, since the information will be more out of date each time.
* Move output hash related operations in to the dedicated threadChristopher Baines2023-05-10
| | | | | So in case of connection loss, this still happens and the work to compute the hash isn't wasted.
* Move computing output hashes to dedicated threadsChristopher Baines2023-05-10
| | | | | | | This should help the coordinator and agents ensure hashes are computed and the agent finds out when this has happened, even in a situation where the coordinator is restarted/crashes and the connection between the agents and coordinator are lost.
* More port forcingChristopher Baines2023-05-09
| | | | | | | | I think NGinx might time out reading the response headers for streaming requests, since the headers probably don't fill the buffer. So force output at that point. Also, address the issue with forcing output to the chunked port.
* Don't specify transfer encoding header in responsesChristopher Baines2023-05-08
| | | | | | As the fibers web server takes care of this. It's currently adding the header twice, so this should be fixed.
* Change submit-output to not spend so much time waitingChristopher Baines2023-05-08
| | | | | Make use of the coordinator trying to avoid the connection timing out. This should improve things when the coordinator is restarted or crashes.
* Specifically handle empty partial uploadsChristopher Baines2023-05-08
| | | | | | This will happen when the upload process is interupted after the data has been received, but before the hash has been computed, perhaps by the coordinator restarting.
* Make logging conditional on the request content lengthChristopher Baines2023-05-08
|
* Stop using chunked transfers for file uploadsChristopher Baines2023-05-08
| | | | | As the amount of data to upload is known, this is unnecessary complexity and overhead.
* Provide progress reporting for computing the hashes of outputsChristopher Baines2023-05-08
| | | | | | | | This happens on the server and can be very slow for large outputs, exceeding the time that NGinx will keep waiting for a response. To address this, stream progress information back to the client, this should keep the connection from timing out.
* Expose the number threads as a metricChristopher Baines2023-05-08
| | | | As I think this might want reducing at some point.
* Add some comments about streaming http requestsChristopher Baines2023-05-05
|
* Include system uptime in the agent status informationChristopher Baines2023-05-05
| | | | As I've found this useful in spotting systems which have problems.
* Remove now redundant logging around gc protectionChristopher Baines2023-05-03
|
* Stop monitoring uploads through the chunked output portChristopher Baines2023-04-30
| | | | Use the dump-port* progress reporter instead.
* Try to instrument ports/file descriptorsChristopher Baines2023-04-27
| | | | | | I'm seeing "too many open file" errors, and while I'm not sure if port-for-each is directly connected, it sounds like it might be worth instrumenting.
* Revert "Remove read-request-body workaround"Christopher Baines2023-04-25
| | | | | | I was mistaken, Guile doesn't handle chunked request bodies. This reverts commit 07e42953f257b846d44376d998cc7d654214ca17.
* Remove read-request-body workaroundChristopher Baines2023-04-25
| | | | | The Guile chunked input port now raises an exception when the input is incomplete.
* Deallocate canceled builds from agents when they startupChristopher Baines2023-04-21
|