| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
Some parts of this were quite slow with anything other than a small database,
so instead of doing slow queries on every request, do some slow queries to
setup the metrics, and then change them as part of the regular changes to the
database.
|
|
|
|
|
| |
I particularly want to monitor the WAL growth, as I don't think SQLite's usual
approach to keeping the size down is sufficient.
|
|
|
|
| |
As it's shorter, and this keeps the logging neat.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, an agent could end up fetching builds from the coordinator, but
not receiving the response, say because of a network issue or timeout. When it
retries, it would fetch even more builds, and there would be some allocated to
it, but that it doesn't know about.
These changes attempt to make fetching builds more idempotent, rather than
returning the new allocated builds, it returns all the builds, and rather than
requesting a number of new builds, it's the total number of allocated builds
that is specified.
|
|
|
|
|
| |
Rather than polling the database every second, use some condition variables to
wake threads when there's probably an event.
|
| |
|
|
|
|
| |
This will help track CPU time, as well as restarts/crashes.
|
|
|
|
|
| |
When submitting builds. The agent will now retry the relevant thing, like
uploading the log file if the coordinator says that still needs doing.
|
|
|
|
|
| |
Things like the agent not having the log file, or an output. This will allow
the agent to actually retry the relevant thing.
|
| |
|
|
|
|
|
|
| |
build-log-file-location replaces build-log-file-exists? as it doesn't always
return a boolean, it also changes to return an absolute filepath for the log
file if it exists, as this will be easier to use.
|
| |
|
| |
|
|
|
|
| |
So that the client part doesn't depend on fibers.
|
|
|
|
|
| |
To start making it possible to use the agent, without having to load anything
related to fibers (as it doesn't work on the hurd yet).
|
| |
|
| |
|
|
|
|
| |
Which was extracted from the Guix Build Coordinator.
|
|
|
|
| |
Rather than the lzlib module within Guix.
|
|
|
|
|
|
|
| |
This isn't particularly accurate, what's actually being stored is the current
time when the record is inserted in to the coordinator database, but that
should happen just before the agent starts the build, so hopefully that's good
enough.
|
| |
|
|
|
|
| |
Also support fetching builds for specific systems from the Guix Data Service.
|
|
|
|
| |
As there seems to be some failures in this area.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
I'm seeing "Resource temporarily unavailable, try again" errors from GnuTLS,
mostly around the file uploads I think.
I'm not sure what's going on here, but it seems to happen when using multiple
threads in parallel. Anyway, this commit uses some mutexes to avoid uploading
files in parallel, and also improves error handling generally. I'm pretty sure
this isn't sufficient to fix the issue, but I could be looking in completely
the wrong place for the problem.
|
| |
|
| |
|
|
|
|
| |
Otherwise old values persist if an agent has no allocated builds.
|
| |
|
|
|
|
|
| |
Associate this with the coordinator, rather than having the logic in the agent
communication code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm looking to listen for client instructions ("build this", ...) maybe on a
UNIX socket, which looks to be possible with fibers, but doing this at the
same time as using a network socket for agent messaging requires more access
than run-server from the fibers web server module currently allows.
To get around this, patch the fibers web server run-server procedure to do
less, and do that instead in the guix-build-coordinator. This is somewhat
similar to what I think Cuirass does to allow it to do more with fibers.
This required messing with the current-fiber parameter in a couple more places
around threads, I'm not really sure why that problem has occurred now. This
current-fiber parameter issue should be resolved in the next fibers release.
One good thing with these changes is some behaviours not related to agent
communication, like triggering build allocation on startup have been moved out
of the agent communication code.
|
|
|
|
| |
Only move the file in to the destination location when the upload completes.
|
|
|
|
|
| |
I'm not sure why I did this... but it's slower and more complex than just not
base64 encoding the data.
|
| |
|
|
|
|
|
| |
Add time logging, increase the buffer size for dump-file, and increase the
retry times.
|
| |
|
| |
|
|
|
|
| |
This should reduce the request durations, and makes retrying slightly easier.
|
| |
|
|
|
|
| |
4 might result in contention.
|
|
|
|
| |
This is probably good practice.
|
|
|
|
| |
Just use the standard error handling in the controller.
|
|
|
|
|
| |
This is already useful to pass around the datastore, hooks and metrics
registry, and will become more useful to pass around the allocator to use.
|
|
|
|
|
|
|
| |
The channel caused problems, as it would potentially block processing requests
if the hook event couldn't be processed immediately. This approach should
avoid that, as well as providing more reliability because the events are
stored in the database.
|
|
|
|
| |
For reporting setup failures.
|
|
|
|
|
|
|
|
| |
This will be useful for submitting setup failures. Once the message is
processed successfully, the agent should no longer have access to update that
build. If the request is processed successfully, but the response isn't
received, when the agent retries, it'll get an access denied response. In this
scenario, that means success, so this will allow it to be treated as such.
|
|
|
|
| |
This is clearer.
|