aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
...
* Add some more debugging output to retry-on-errorChristopher Baines2020-05-20
|
* Try to better handle/avoid http related failuresChristopher Baines2020-05-20
| | | | | | | | | | | I'm seeing "Resource temporarily unavailable, try again" errors from GnuTLS, mostly around the file uploads I think. I'm not sure what's going on here, but it seems to happen when using multiple threads in parallel. Anyway, this commit uses some mutexes to avoid uploading files in parallel, and also improves error handling generally. I'm pretty sure this isn't sufficient to fix the issue, but I could be looking in completely the wrong place for the problem.
* Add to .dir-locals.elChristopher Baines2020-05-20
|
* Guard against garbage collecting in multiple threadsChristopher Baines2020-05-20
| | | | As the same time, I think I've seen issues with deleting links.
* Print out a message when retrying succeedsChristopher Baines2020-05-20
| | | | Given a failure.
* Include retrying when submitting buildsChristopher Baines2020-05-20
|
* Switch to using threads for running builds in parallelChristopher Baines2020-05-20
| | | | | | primative-fork in Guile seems more trouble than its worth, the parent process seemed to lock up frequently. I think using threads could be causing problems with TLS, but at least it doesn't lock up completely.
* Add more error handling in to call-with-streaming-http-requestChristopher Baines2020-05-20
| | | | | As I think there's "Resource temporarily unavailable, try again." errors coming from here...
* Start counting allocation successes and failuresChristopher Baines2020-05-20
|
* Add support for counter metricsChristopher Baines2020-05-20
|
* Send requests directly to the coordinator for submitting buildsChristopher Baines2020-05-20
|
* Fix zeroing the right metricChristopher Baines2020-05-19
|
* Zero the allocated build counts as wellChristopher Baines2020-05-19
|
* Zero the allocated build counts for each agentChristopher Baines2020-05-19
| | | | Otherwise old values persist if an agent has no allocated builds.
* Fetch substitutes in a separate channelChristopher Baines2020-05-19
| | | | As I'm guessing this could block the thread for fibers.
* Improve handling of submitting buildsChristopher Baines2020-05-19
| | | | | | Don't always substitute the derivation, just fetch it if it doesn't exist in the database. Also just use the name of the derivation, only read it from the disk when it needs storing in the database.
* Fix the build show missing inputs functionalityChristopher Baines2020-05-19
|
* Make it possible to show builds for an outputChristopher Baines2020-05-19
|
* Don't check for child processes in the agent if there are noneChristopher Baines2020-05-18
| | | | Or at least there shouldn't be.
* Move sleeping when there are no new builds availableChristopher Baines2020-05-18
|
* Sleep for 30 seconds if no new builds are availableChristopher Baines2020-05-18
|
* Fix gc being disabled in the agent processes post forkChristopher Baines2020-05-18
|
* Reduce the wait before checking for exited processesChristopher Baines2020-05-17
|
* Improve the agent job process exit messageChristopher Baines2020-05-17
|
* Disable Guile's GC while forkingChristopher Baines2020-05-17
| | | | | Sometimes primitive-fork seems to hang, maybe related to a futex system call. I think disabling the garbage collector helps avoid this.
* Improve the agent parallel job processingChristopher Baines2020-05-17
|
* Support agents processing builds in parallelChristopher Baines2020-05-17
|
* Retry fetching substitutes for buildsChristopher Baines2020-05-17
| | | | In case of failures.
* Convert the client actions to happen over HTTPChristopher Baines2020-05-17
| | | | | | | | | | | | There were a few issues with the previous approach, I was concerned about trying to write to the SQLite database from two processes, it's already segfaulting occasionally when accessing it from just one. Additionally, the client actions were already doing things that should happen in the coordinator process, like allocating builds. I'm trying to not turn this in to a web app, but not doing very well. Although having this information and these actions available over the network does make it possible to build a web app frontend, which I've had in mind.
* Use a variableChristopher Baines2020-05-17
|
* Remove some unused codeChristopher Baines2020-05-17
|
* Change how triggering build allocations worksChristopher Baines2020-05-17
| | | | | Associate this with the coordinator, rather than having the logic in the agent communication code.
* Open up more fibers possibilities in the coordinatorChristopher Baines2020-05-17
| | | | | | | | | | | | | | | | | | | I'm looking to listen for client instructions ("build this", ...) maybe on a UNIX socket, which looks to be possible with fibers, but doing this at the same time as using a network socket for agent messaging requires more access than run-server from the fibers web server module currently allows. To get around this, patch the fibers web server run-server procedure to do less, and do that instead in the guix-build-coordinator. This is somewhat similar to what I think Cuirass does to allow it to do more with fibers. This required messing with the current-fiber parameter in a couple more places around threads, I'm not really sure why that problem has occurred now. This current-fiber parameter issue should be resolved in the next fibers release. One good thing with these changes is some behaviours not related to agent communication, like triggering build allocation on startup have been moved out of the agent communication code.
* Switch the command line options for the agent communication configChristopher Baines2020-05-16
| | | | | | | To make it clear this is what it's for. This makes it easier to allow other ways of communicating with agent processes in the future, as well as making it easier to set out how to also listen for client commands, which I'm thinking about now.
* Guard against failed file uploadsChristopher Baines2020-05-13
| | | | Only move the file in to the destination location when the upload completes.
* Make sure to include all unprocessed builds in the graphChristopher Baines2020-05-13
| | | | | | When only starting from builds where the derivation isn't used in other derivations, you risk missing parts of the derivation graph that aren't covered with builds.
* Fix an issue with the propagated priorities queryChristopher Baines2020-05-13
| | | | | | It was returning multiple records for a build, if that build could be reached through multiple paths in the graph, resulting in different priorities. Only the max priority matters, so have the query find that.
* Avoid using the deprecated guile-next packageChristopher Baines2020-05-12
| | | | On newer version of Guix with guile-3.0.
* Fix missing builds from the derived priorities queryChristopher Baines2020-05-11
| | | | | Just eliminate processed builds at the end, otherwise parts of the graph could be left unexplored.
* Make guix-dev.scm more forwards compatibleChristopher Baines2020-05-11
| | | | Recent versions of Guix lack guile3.0-readline, so define it.
* Add more debugging for printing backtraces for allocator exceptionsChristopher Baines2020-05-11
| | | | As it doesn't seem to be working, the backtrace printed is non-existent.
* Make sure to try again with allocation exceptionsChristopher Baines2020-05-11
|
* Optimise the database and truncate the WAL on startupChristopher Baines2020-05-11
|
* Add some more indexes to speed up derivation ordered allocationChristopher Baines2020-05-11
|
* Speed up the derivation ordered allocator a little bitChristopher Baines2020-05-11
| | | | | Use EXCEPT, rather than NOT IN to make the SQL query faster. Also, just return and use the build id, rather than a alist.
* Try more to not forget about the need to allocate buildsChristopher Baines2020-05-11
| | | | | If an allocation is triggered while one is in progress, store the need to allocate again in an atomic box.
* Replace datastore-fetch-input-builds-for-unprocessed-buildsChristopher Baines2020-05-10
| | | | | | | It worked under some database conditions, but was very slow under others. Move more of the logic in to SQL in an attempt to make the allocator faster. This sort of works, but there were some advantages to the approach before the approach being replaced in this commit.
* Use some more SQL to speed up the derivation ordered allocatorChristopher Baines2020-05-10
|
* Add datastore-fetch-unprocessed-builds-with-propagated-prioritiesChristopher Baines2020-05-10
| | | | To use with the derivation ordered allocator.
* Fix datastore-fetch-input-builds-for-unprocessed-buildsChristopher Baines2020-05-10
| | | | | Previously it was ignoring outputs without builds or build results. This fixes that.