Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Prioritise post build actionspost-build-prioritisation | Christopher Baines | 2023-04-11 |
| | | | | | | | By the priority of the build, and then by the bytes that need uploading. This should help ensure that priority builds get handled first when there's congestion getting data back to the coordinator. Prioritising builds with less data to upload should also keep things moving when uploads are slow as well. | ||
* | Add priority support to create-work-queue | Christopher Baines | 2023-04-11 |
| | | | | | | | This isn't ideal as the process-job interface changes when you enable prioritisation, but that's not a big issue. This should enable prioritising post build operations. | ||
* | Add missing join | Christopher Baines | 2023-04-11 |
| | |||
* | Use underscores for derivation_name | Christopher Baines | 2023-04-11 |
| | | | | | | As this is more consistent in the JSON responses. Signed-off-by: Christopher Baines <mail@cbaines.net> | ||
* | Expose the derived priority to agents | Christopher Baines | 2023-04-11 |
| | | | | | Rather than the priority, as it's the derived priority that they should be using for decision making. | ||
* | Remove datastore-select-allocated-builds | Christopher Baines | 2023-04-11 |
| | | | | As it's a less well named copy of datastore-list-agent-builds. | ||
* | Drop the delay for retrying uploads on failure | Christopher Baines | 2023-04-11 |
| | |||
* | Remove the crude alarm based timeout for submitting outputs | Christopher Baines | 2023-04-11 |
| | | | | | This should be unnecessary now that there's progress on getting the I/O operations to timeout. | ||
* | Reduce logging on build failures | Christopher Baines | 2023-04-11 |
| | |||
* | Include build priority when selecting allocated builds | Christopher Baines | 2023-04-11 |
| | |||
* | Strip down the guix-dev.scm file | Christopher Baines | 2023-04-10 |
| | | | | Assume that a recent version of guix will be used. | ||
* | Include the build priority when agents fetch builds | Christopher Baines | 2023-04-10 |
| | | | | This means the agent can use it to prioritise various things. | ||
* | Change allocate-builds to update-build-allocation-plan | Christopher Baines | 2023-04-10 |
| | | | | As this is a better name. | ||
* | Use a timeout when substituting derivations in the publish hook | Christopher Baines | 2023-04-10 |
| | | | | As this can block if the store GC is running. | ||
* | Improve event/state id support for events | Christopher Baines | 2023-04-03 |
| | | | | | Support the Last-Event-ID header in the events endpoint, and include the event id's in the responses. | ||
* | Try to improve hook exception handling | Christopher Baines | 2023-04-02 |
| | | | | | This should lead to more concise backtraces at least although it may reintroduce the problem where backtraces lead to excessive memory usage. | ||
* | Don't call (backtrace) in the build allocator | Christopher Baines | 2023-04-01 |
| | | | | | It seems to cause the same memory issues as calling (backtrace) with the hooks. | ||
* | Give up printing backtraces for exceptions in hooks | Christopher Baines | 2023-03-29 |
| | | | | I think it's causing problems that I'm struggling to reproduce and debug. | ||
* | Try and ensure that the non-fibers sleep is used in places | Christopher Baines | 2023-03-29 |
| | | | | | | When not using fibers. I don't know if a different sleep is being used, and I don't think I've read anything about having to avoid this, but I'm running out of ideas. | ||
* | Provide more information in process-event error handling | Christopher Baines | 2023-03-29 |
| | | | | There's still problems here, but it's unclear where. | ||
* | Remove backtrace printing from create-thread-pool | Christopher Baines | 2023-03-29 |
| | | | | | Just in case this is causing a problem with the exception handling within proc. | ||
* | Always keep one thread running to process hooks | Christopher Baines | 2023-03-29 |
| | | | | This should reduce the need to keep stopping and starting threads. | ||
* | Guard against exceptions in the thread pool monitor thread | Christopher Baines | 2023-03-29 |
| | | | | As I've seen the dreaded encoding-error here now that there's some logging. | ||
* | Switch the timeout approach for guix-data-service requests | Christopher Baines | 2023-03-29 |
| | | | | In case this helps with avoiding hooks hanging. | ||
* | Decrease the planned builds for agents | Christopher Baines | 2023-03-29 |
| | | | | Since the allocator is fast now. | ||
* | Add more logging around parallel hooks | Christopher Baines | 2023-03-28 |
| | | | | | I still can't reproduce any problems locally, so this might help work out what the state of the different threads are. | ||
* | Change how parallel hook processing works | Christopher Baines | 2023-03-28 |
| | | | | | | | | The previous approach was inefficient, since there was a thread that just repeatedly tries to queue every unprocessed hook event for processing. Instead, use a thread pool that pulls events from the database. This still involves some work to not process the same event in different threads, but it should hopefully scale better. | ||
* | Add create-thread-pool | Christopher Baines | 2023-03-28 |
| | | | | | | This is like create-work-queue, but pulls the jobs, rather than the jobs being pushed to it. This should work more efficiently for the hooks, where there are often lots of events to process. | ||
* | Remove %random-state | Christopher Baines | 2023-03-28 |
| | | | | As it's unused. | ||
* | Move waiting after hook errors in to process-event | Christopher Baines | 2023-03-27 |
| | | | | So that this happens for parallel hooks as well. | ||
* | Stop using the guix-memory-metrics-updater | Christopher Baines | 2023-03-27 |
| | | | | | The data doesn't look particularly useful, and I think the memory problem I was chasing was down to a broken hook (and poor handling of that). | ||
* | Handle the tags being a vector in submit-build | Christopher Baines | 2023-03-27 |
| | | | | As the retry hook uses this functionality. | ||
* | Tweak how the exception handling and logging works for hooks | Christopher Baines | 2023-03-27 |
| | | | | | Since the errors don't seem to be getting logged properly, the backtrace output stops part way. | ||
* | Instrument the size of some Guix managed hash tables | Christopher Baines | 2023-03-27 |
| | | | | In case any of these are a factor in the occasional high memory use. | ||
* | Fix sending tags as part of the build-submitted event | Christopher Baines | 2023-03-26 |
| | |||
* | Include build tags in the agent-builds-allocated event | Christopher Baines | 2023-03-26 |
| | |||
* | Send an event when builds are allocated to an agent | Christopher Baines | 2023-03-25 |
| | |||
* | Send event on allocation plan update | Christopher Baines | 2023-03-25 |
| | |||
* | Send event on agent status update | Christopher Baines | 2023-03-25 |
| | |||
* | Add timestamp to the events | Christopher Baines | 2023-03-25 |
| | |||
* | Add processor count to the agent status | Christopher Baines | 2023-03-24 |
| | | | | This is useful when interpreting the load information. | ||
* | Include the timestamp when fetching the agent status from the db | Christopher Baines | 2023-03-23 |
| | |||
* | Fix status load average handling | Christopher Baines | 2023-03-22 |
| | | | | As the keys in JSON are strings. | ||
* | Include agent tags in the coordinator state | Christopher Baines | 2023-03-22 |
| | |||
* | Fix sharing build tags through the state | Christopher Baines | 2023-03-22 |
| | |||
* | Include the allocation plan size in the coordinator state | Christopher Baines | 2023-03-22 |
| | |||
* | Include build tags in the coordinator state | Christopher Baines | 2023-03-22 |
| | |||
* | Include agent requested systems in the coordinator state | Christopher Baines | 2023-03-22 |
| | |||
* | Include the last agent statuses in the overall status | Christopher Baines | 2023-03-22 |
| | |||
* | Have agents send their status every 30 seconds | Christopher Baines | 2023-03-22 |
| |