aboutsummaryrefslogtreecommitdiff
path: root/scripts
Commit message (Expand)AuthorAge
* Use with-exception-handler in place of with-throw-handlerChristopher Baines2025-02-25
* Make the job timeout configurableChristopher Baines2025-02-10
* Add a slightly crude method to ignore systems and targets•••While processing a revision. It would be good to also record what systems and targets are in the platforms so it's clear what data is missing, but that can be added later. Christopher Baines2025-02-03
* Ensure COLUMNS is setChristopher Baines2025-02-03
* Fix starting with an empty databaseChristopher Baines2024-11-08
* Make the free space requirement configurableChristopher Baines2024-08-20
* Support setting environment variables in the inferior•••When processing jobs, this is mostly to allow setting GUIX_DOWNLOAD_METHODS. Christopher Baines2024-06-24
* Add error handling for startup failuresChristopher Baines2024-04-02
* Move backfilling in to the server module and use the connection pool•••To avoid using the old PostgreSQL connection per thread code. Christopher Baines2024-04-01
* Add exception handling to the process-jobs script•••As I'm seeing this exit on beid, but I'm not sure why. Christopher Baines2024-03-05
* Remove drain? #t from process job•••As it now uses more fibers. Christopher Baines2024-01-18
* Add meaningful parallelism to processing jobs•••Make parallel use of inferiors when computing channel instance derivations, and when extracting information about a revision. This should allow for some horizontal scalability, reducing the impact of additional systems for which derivations need computing. This commit also fixes an apparent issue with package replacements, as previously the wrong id was used, and this hid some issues around deduplication. Christopher Baines2024-01-18
* Set %file-port-name-canonicalization when processing jobs•••Just in case this helps with performance. Christopher Baines2023-12-04
* Use fibers when processing new revisions•••Just have one fiber at the moment, but this will enable using fibers for parallelism in the future. Fibers seemed to cause problems with the logging setup, which was a bit odd in the first place. So move logging to the parent process which is better anyway. Christopher Baines2023-11-05
* Support polling git repositories for new branches/revisions•••This is mostly a workaround for the occasional problems with the guix-commits mailing list, as it can break and then the data service doesn't learn about new revisions until the problem is fixed. I think it's still a generally good feature though, and allows deploying the data service without it consuming emails to learn about new revisions, and is a step towards integrating some kind of way of notifying the data service to poll. Christopher Baines2023-10-09
* Stop using a pool of threads for database operations•••Now that squee cooperates with suspendable ports, this is unnecessary. Use a connection pool to still support running queries in parallel using multiple connections. Christopher Baines2023-07-10
* Detach the database setup from the main guix-data-service process•••This will allow restarting them independently, leaving it up to the operator to ensure that all processes are compatible. Christopher Baines2023-06-09
* Query for outputs when build events arrive•••This will keep the substitute information more up to date. Christopher Baines2023-06-09
* Set request timeouts for the thread pools•••The request timeout should ensure that the operations don't back up if the thread pool is overloaded. Christopher Baines2023-04-27
* Split the thread pool used for database connections•••In to two thread pools, a default one, and one reserved for essential functionality. There are some pages that use slow queries, so this should help stop those pages block other operations. Christopher Baines2023-04-27
* Defer backfilling derivation distribution counts until later•••After the migrations have run. Christopher Baines2023-03-09
* Store the distribution of derivations related to packages•••This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query. Christopher Baines2023-03-09
* Allow skipping processing system tests•••Generating system test derivations are difficult, since you generally need to do potentially expensive builds for the system you're generating the system tests for. You might not want to disable grafts for instance because you might be trying to test whatever the test is testing in the context of grafts being enabled. I'm looking at skipping the system tests on data.guix.gnu.org, because they're not used and quite expensive to compute. Christopher Baines2023-02-08
* Drop the thread pool idle seconds•••To hopefully bring down the memory usage from idle connections. Christopher Baines2022-11-24
* Close postgresql connections when the thread pool thread is idle•••I think the idle connections associated with idle threads are still taking up memory, so especially now that you can configure an arbitrary number of threads (and thus connections), I think it's good to close them regularly. Christopher Baines2022-10-23
* Make it possible to increase the number of thread pool threads•••And double the default to 16. Christopher Baines2022-10-02
* Handle migrations and server startup better•••The server part of the guix-data-service doesn't work great as a guix service, since it often fails to start if the migrations take any time at all. To address this, start the server before running the migrations, and serve the pages that work without the database, plus a general 503 response. Once the migrations have completed, switch to the normal behaviour. Christopher Baines2022-06-17
* Fix more issues with the git_commits introductionChristopher Baines2022-05-23
* Query substitutes for latest processed revisions periodically•••This is a step towards having up to date substitute availability data. Christopher Baines2021-11-16
* Fix a regression with running sqitch•••Introduced in 0dc05982cde052c985bb440dc026cbe3334ee50b. Christopher Baines2021-07-11
* Run sqitch in the change mode•••Since this rolls back migrations less, which is good when the rollback bit isn't always implemented. Christopher Baines2021-07-04
* Try to adapt the PostgreSQL paramstring to use with sqitchChristopher Baines2021-06-16
* Allow customising the pg_dump command used•••As this Christopher Baines2021-01-03
* Support not querying pending builds•••As this can take some time. Christopher Baines2020-11-01
* Allow only fetching builds for a specific systemChristopher Baines2020-11-01
* Fix create small backup issue with latest_build_statusChristopher Baines2020-10-23
* Make it easier to get to a replChristopher Baines2020-10-10
* Stop opening a PostgreSQL connection per request•••This was good in that it avoided having to deal with long running connections, but it probably takes some time to open the connection, and these changes are a step towards offloading the PostgreSQL queries to other threads, so they don't block the threads for fibers. Christopher Baines2020-10-03
* Remove development code from the process job scriptChristopher Baines2020-09-28
* Add a JSON page for repository branchesChristopher Baines2020-09-27
* Replace debug-set! with setenv COLUMNS•••As that actually seems to work. Christopher Baines2020-09-26
* Change the locale codeset representation•••From the normalized one, to the one actually contained within glibc. Recent versions of glibc also contain symlinks linking the normalized codeset to the locales with the .UTF-8 ending, but older ones do not. Maybe handling codeset normalisation for queries would be good, but the locale values ending in .UTF-8 are more compatible and allow the code to be simplified. For querying, maybe there should be a locales table which handles different representations. Christopher Baines2020-09-26
* Set the locale at the start of the process jobs script•••This might help with the odd [1] errors regarding PostgreSQL queries. 1: invalid byte sequence for encoding "UTF8": Christopher Baines2020-09-20
* Increase the stack trace width when processing jobs•••As this might result in more useful error messages. Christopher Baines2020-09-20
* Add a lookup_builds field to the build_servers table•••This is to allow for build servers where only the substitutes should be queried, and it shouldn't be assumed that they're running Cuirass. Christopher Baines2020-05-24
* Move around --no-tablespaces•••Turns out, at the moment, this is ineffective when combined with the archive formats, like the custom format in use. Therefore, move it to the pg_restore command, where hopefully it'll work. Christopher Baines2020-05-16
* Don't include tablespace assignments in the backup dump•••This is a comprimise, as this won't help restoring the backup in situations you want tablespaces, but I'm currently viewing tablespaces as a deployment concern, so maybe the right thing to do is exclude them. This approach will at least keep the same behaviour in terms of restoring the backups locally. This will fix the small dump creation process on data.guix.gnu.org, which is currently broken because of the tablespace assignments when trying to restore the backups. Christopher Baines2020-05-14
* Split out querying of build servers and substitute servers•••These are related things, but somewhat separate. This change should make it easier to deal with changes regarding querying build servers, and querying substitute servers. Christopher Baines2020-05-03
* Set a statement timeout of 60 seconds for web requests•••This will help stop queries running for an unnecessarily long time, longer than NGinx will wait for example. Christopher Baines2020-04-24
* Rebuild the package derivation ranges table for the small backup•••This is better than just deleting the entries that don't match up with the remaining revisions, but also not very useful for local development (due to the lack of data). Christopher Baines2020-03-31