| Commit message (Expand) | Author | Age |
* | Use with-exception-handler in place of with-throw-handler | Christopher Baines | 2025-02-25 |
* | Make the job timeout configurable | Christopher Baines | 2025-02-10 |
* | Add a slightly crude method to ignore systems and targets•••While processing a revision. It would be good to also record what systems and
targets are in the platforms so it's clear what data is missing, but that can
be added later.
| Christopher Baines | 2025-02-03 |
* | Ensure COLUMNS is set | Christopher Baines | 2025-02-03 |
* | Fix starting with an empty database | Christopher Baines | 2024-11-08 |
* | Make the free space requirement configurable | Christopher Baines | 2024-08-20 |
* | Support setting environment variables in the inferior•••When processing jobs, this is mostly to allow setting GUIX_DOWNLOAD_METHODS.
| Christopher Baines | 2024-06-24 |
* | Add error handling for startup failures | Christopher Baines | 2024-04-02 |
* | Move backfilling in to the server module and use the connection pool•••To avoid using the old PostgreSQL connection per thread code.
| Christopher Baines | 2024-04-01 |
* | Add exception handling to the process-jobs script•••As I'm seeing this exit on beid, but I'm not sure why.
| Christopher Baines | 2024-03-05 |
* | Remove drain? #t from process job•••As it now uses more fibers.
| Christopher Baines | 2024-01-18 |
* | Add meaningful parallelism to processing jobs•••Make parallel use of inferiors when computing channel instance derivations,
and when extracting information about a revision. This should allow for some
horizontal scalability, reducing the impact of additional systems for which
derivations need computing.
This commit also fixes an apparent issue with package replacements, as
previously the wrong id was used, and this hid some issues around
deduplication.
| Christopher Baines | 2024-01-18 |
* | Set %file-port-name-canonicalization when processing jobs•••Just in case this helps with performance.
| Christopher Baines | 2023-12-04 |
* | Use fibers when processing new revisions•••Just have one fiber at the moment, but this will enable using fibers for
parallelism in the future.
Fibers seemed to cause problems with the logging setup, which was a bit odd in
the first place. So move logging to the parent process which is better anyway.
| Christopher Baines | 2023-11-05 |
* | Support polling git repositories for new branches/revisions•••This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.
I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
| Christopher Baines | 2023-10-09 |
* | Stop using a pool of threads for database operations•••Now that squee cooperates with suspendable ports, this is unnecessary. Use a
connection pool to still support running queries in parallel using multiple
connections.
| Christopher Baines | 2023-07-10 |
* | Detach the database setup from the main guix-data-service process•••This will allow restarting them independently, leaving it up to the operator
to ensure that all processes are compatible.
| Christopher Baines | 2023-06-09 |
* | Query for outputs when build events arrive•••This will keep the substitute information more up to date.
| Christopher Baines | 2023-06-09 |
* | Set request timeouts for the thread pools•••The request timeout should ensure that the operations don't back up if the
thread pool is overloaded.
| Christopher Baines | 2023-04-27 |
* | Split the thread pool used for database connections•••In to two thread pools, a default one, and one reserved for essential
functionality.
There are some pages that use slow queries, so this should help stop those
pages block other operations.
| Christopher Baines | 2023-04-27 |
* | Defer backfilling derivation distribution counts until later•••After the migrations have run.
| Christopher Baines | 2023-03-09 |
* | Store the distribution of derivations related to packages•••This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.
The data looks like this (for a specified system and target):
┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│ 15 │ 2 │
│ 14 │ 3 │
│ 13 │ 3 │
│ 12 │ 3 │
│ 11 │ 14 │
│ 10 │ 25 │
│ 9 │ 44 │
│ 8 │ 91 │
│ 7 │ 1084 │
│ 6 │ 311 │
│ 5 │ 432 │
│ 4 │ 515 │
│ 3 │ 548 │
│ 2 │ 2201 │
│ 1 │ 21162 │
│ 0 │ 22310 │
└───────┴───────┘
Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.
When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.
When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
| Christopher Baines | 2023-03-09 |
* | Allow skipping processing system tests•••Generating system test derivations are difficult, since you generally need to
do potentially expensive builds for the system you're generating the system
tests for. You might not want to disable grafts for instance because you might
be trying to test whatever the test is testing in the context of grafts being
enabled.
I'm looking at skipping the system tests on data.guix.gnu.org, because they're
not used and quite expensive to compute.
| Christopher Baines | 2023-02-08 |
* | Drop the thread pool idle seconds•••To hopefully bring down the memory usage from idle connections.
| Christopher Baines | 2022-11-24 |
* | Close postgresql connections when the thread pool thread is idle•••I think the idle connections associated with idle threads are still taking up
memory, so especially now that you can configure an arbitrary number of
threads (and thus connections), I think it's good to close them regularly.
| Christopher Baines | 2022-10-23 |
* | Make it possible to increase the number of thread pool threads•••And double the default to 16.
| Christopher Baines | 2022-10-02 |
* | Handle migrations and server startup better•••The server part of the guix-data-service doesn't work great as a guix service,
since it often fails to start if the migrations take any time at all.
To address this, start the server before running the migrations, and serve the
pages that work without the database, plus a general 503 response. Once the
migrations have completed, switch to the normal behaviour.
| Christopher Baines | 2022-06-17 |
* | Fix more issues with the git_commits introduction | Christopher Baines | 2022-05-23 |
* | Query substitutes for latest processed revisions periodically•••This is a step towards having up to date substitute availability data.
| Christopher Baines | 2021-11-16 |
* | Fix a regression with running sqitch•••Introduced in 0dc05982cde052c985bb440dc026cbe3334ee50b.
| Christopher Baines | 2021-07-11 |
* | Run sqitch in the change mode•••Since this rolls back migrations less, which is good when the rollback bit
isn't always implemented.
| Christopher Baines | 2021-07-04 |
* | Try to adapt the PostgreSQL paramstring to use with sqitch | Christopher Baines | 2021-06-16 |
* | Allow customising the pg_dump command used•••As this
| Christopher Baines | 2021-01-03 |
* | Support not querying pending builds•••As this can take some time.
| Christopher Baines | 2020-11-01 |
* | Allow only fetching builds for a specific system | Christopher Baines | 2020-11-01 |
* | Fix create small backup issue with latest_build_status | Christopher Baines | 2020-10-23 |
* | Make it easier to get to a repl | Christopher Baines | 2020-10-10 |
* | Stop opening a PostgreSQL connection per request•••This was good in that it avoided having to deal with long running connections,
but it probably takes some time to open the connection, and these changes are
a step towards offloading the PostgreSQL queries to other threads, so they
don't block the threads for fibers.
| Christopher Baines | 2020-10-03 |
* | Remove development code from the process job script | Christopher Baines | 2020-09-28 |
* | Add a JSON page for repository branches | Christopher Baines | 2020-09-27 |
* | Replace debug-set! with setenv COLUMNS•••As that actually seems to work.
| Christopher Baines | 2020-09-26 |
* | Change the locale codeset representation•••From the normalized one, to the one actually contained within glibc. Recent
versions of glibc also contain symlinks linking the normalized codeset to the
locales with the .UTF-8 ending, but older ones do not.
Maybe handling codeset normalisation for queries would be good, but the locale
values ending in .UTF-8 are more compatible and allow the code to be
simplified. For querying, maybe there should be a locales table which handles
different representations.
| Christopher Baines | 2020-09-26 |
* | Set the locale at the start of the process jobs script•••This might help with the odd [1] errors regarding PostgreSQL queries.
1: invalid byte sequence for encoding "UTF8":
| Christopher Baines | 2020-09-20 |
* | Increase the stack trace width when processing jobs•••As this might result in more useful error messages.
| Christopher Baines | 2020-09-20 |
* | Add a lookup_builds field to the build_servers table•••This is to allow for build servers where only the substitutes should be
queried, and it shouldn't be assumed that they're running Cuirass.
| Christopher Baines | 2020-05-24 |
* | Move around --no-tablespaces•••Turns out, at the moment, this is ineffective when combined with the archive
formats, like the custom format in use. Therefore, move it to the pg_restore
command, where hopefully it'll work.
| Christopher Baines | 2020-05-16 |
* | Don't include tablespace assignments in the backup dump•••This is a comprimise, as this won't help restoring the backup in situations
you want tablespaces, but I'm currently viewing tablespaces as a deployment
concern, so maybe the right thing to do is exclude them. This approach will at
least keep the same behaviour in terms of restoring the backups locally.
This will fix the small dump creation process on data.guix.gnu.org, which is
currently broken because of the tablespace assignments when trying to restore
the backups.
| Christopher Baines | 2020-05-14 |
* | Split out querying of build servers and substitute servers•••These are related things, but somewhat separate. This change should make it
easier to deal with changes regarding querying build servers, and querying
substitute servers.
| Christopher Baines | 2020-05-03 |
* | Set a statement timeout of 60 seconds for web requests•••This will help stop queries running for an unnecessarily long time, longer
than NGinx will wait for example.
| Christopher Baines | 2020-04-24 |
* | Rebuild the package derivation ranges table for the small backup•••This is better than just deleting the entries that don't match up with the
remaining revisions, but also not very useful for local development (due to
the lack of data).
| Christopher Baines | 2020-03-31 |