guix/data-service

	Commit message (Collapse)	Author	Age
*	Remove drain? #t from process job	Christopher Baines	2024-01-18
\| \| \| \|	As it now uses more fibers.
*	Add meaningful parallelism to processing jobs	Christopher Baines	2024-01-18
\| \| \| \| \| \| \| \| \| \| \|	Make parallel use of inferiors when computing channel instance derivations, and when extracting information about a revision. This should allow for some horizontal scalability, reducing the impact of additional systems for which derivations need computing. This commit also fixes an apparent issue with package replacements, as previously the wrong id was used, and this hid some issues around deduplication.
*	Set %file-port-name-canonicalization when processing jobs	Christopher Baines	2023-12-04
\| \| \| \|	Just in case this helps with performance.
*	Use fibers when processing new revisions	Christopher Baines	2023-11-05
\| \| \| \| \| \| \| \|	Just have one fiber at the moment, but this will enable using fibers for parallelism in the future. Fibers seemed to cause problems with the logging setup, which was a bit odd in the first place. So move logging to the parent process which is better anyway.
*	Support polling git repositories for new branches/revisions	Christopher Baines	2023-10-09
\| \| \| \| \| \| \| \| \| \| \|	This is mostly a workaround for the occasional problems with the guix-commits mailing list, as it can break and then the data service doesn't learn about new revisions until the problem is fixed. I think it's still a generally good feature though, and allows deploying the data service without it consuming emails to learn about new revisions, and is a step towards integrating some kind of way of notifying the data service to poll.
*	Stop using a pool of threads for database operations	Christopher Baines	2023-07-10
\| \| \| \| \| \|	Now that squee cooperates with suspendable ports, this is unnecessary. Use a connection pool to still support running queries in parallel using multiple connections.
*	Detach the database setup from the main guix-data-service process	Christopher Baines	2023-06-09
\| \| \| \| \|	This will allow restarting them independently, leaving it up to the operator to ensure that all processes are compatible.
*	Query for outputs when build events arrive	Christopher Baines	2023-06-09
\| \| \| \|	This will keep the substitute information more up to date.
*	Set request timeouts for the thread pools	Christopher Baines	2023-04-27
\| \| \| \| \|	The request timeout should ensure that the operations don't back up if the thread pool is overloaded.
*	Split the thread pool used for database connections	Christopher Baines	2023-04-27
\| \| \| \| \| \| \| \|	In to two thread pools, a default one, and one reserved for essential functionality. There are some pages that use slow queries, so this should help stop those pages block other operations.
*	Defer backfilling derivation distribution counts until later	Christopher Baines	2023-03-09
\| \| \| \|	After the migrations have run.
*	Store the distribution of derivations related to packages	Christopher Baines	2023-03-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query.
*	Allow skipping processing system tests	Christopher Baines	2023-02-08
\| \| \| \| \| \| \| \| \| \| \|	Generating system test derivations are difficult, since you generally need to do potentially expensive builds for the system you're generating the system tests for. You might not want to disable grafts for instance because you might be trying to test whatever the test is testing in the context of grafts being enabled. I'm looking at skipping the system tests on data.guix.gnu.org, because they're not used and quite expensive to compute.
*	Drop the thread pool idle seconds	Christopher Baines	2022-11-24
\| \| \| \|	To hopefully bring down the memory usage from idle connections.
*	Close postgresql connections when the thread pool thread is idle	Christopher Baines	2022-10-23
\| \| \| \| \| \|	I think the idle connections associated with idle threads are still taking up memory, so especially now that you can configure an arbitrary number of threads (and thus connections), I think it's good to close them regularly.
*	Make it possible to increase the number of thread pool threads	Christopher Baines	2022-10-02
\| \| \| \|	And double the default to 16.
*	Handle migrations and server startup better	Christopher Baines	2022-06-17
\| \| \| \| \| \| \| \| \|	The server part of the guix-data-service doesn't work great as a guix service, since it often fails to start if the migrations take any time at all. To address this, start the server before running the migrations, and serve the pages that work without the database, plus a general 503 response. Once the migrations have completed, switch to the normal behaviour.
*	Fix more issues with the git_commits introduction	Christopher Baines	2022-05-23
\|
*	Query substitutes for latest processed revisions periodically	Christopher Baines	2021-11-16
\| \| \| \|	This is a step towards having up to date substitute availability data.
*	Fix a regression with running sqitch	Christopher Baines	2021-07-11
\| \| \| \|	Introduced in 0dc05982cde052c985bb440dc026cbe3334ee50b.
*	Run sqitch in the change mode	Christopher Baines	2021-07-04
\| \| \| \| \|	Since this rolls back migrations less, which is good when the rollback bit isn't always implemented.
*	Try to adapt the PostgreSQL paramstring to use with sqitch	Christopher Baines	2021-06-16
\|
*	Allow customising the pg_dump command used	Christopher Baines	2021-01-03
\| \| \| \|	As this
*	Support not querying pending builds	Christopher Baines	2020-11-01
\| \| \| \|	As this can take some time.
*	Allow only fetching builds for a specific system	Christopher Baines	2020-11-01
\|
*	Fix create small backup issue with latest_build_status	Christopher Baines	2020-10-23
\|
*	Make it easier to get to a repl	Christopher Baines	2020-10-10
\|
*	Stop opening a PostgreSQL connection per request	Christopher Baines	2020-10-03
\| \| \| \| \| \| \|	This was good in that it avoided having to deal with long running connections, but it probably takes some time to open the connection, and these changes are a step towards offloading the PostgreSQL queries to other threads, so they don't block the threads for fibers.
*	Remove development code from the process job script	Christopher Baines	2020-09-28
\|
*	Add a JSON page for repository branches	Christopher Baines	2020-09-27
\|
*	Replace debug-set! with setenv COLUMNS	Christopher Baines	2020-09-26
\| \| \| \|	As that actually seems to work.
*	Change the locale codeset representation	Christopher Baines	2020-09-26
\| \| \| \| \| \| \| \| \| \| \|	From the normalized one, to the one actually contained within glibc. Recent versions of glibc also contain symlinks linking the normalized codeset to the locales with the .UTF-8 ending, but older ones do not. Maybe handling codeset normalisation for queries would be good, but the locale values ending in .UTF-8 are more compatible and allow the code to be simplified. For querying, maybe there should be a locales table which handles different representations.
*	Set the locale at the start of the process jobs script	Christopher Baines	2020-09-20
\| \| \| \| \| \|	This might help with the odd [1] errors regarding PostgreSQL queries. 1: invalid byte sequence for encoding "UTF8":
*	Increase the stack trace width when processing jobs	Christopher Baines	2020-09-20
\| \| \| \|	As this might result in more useful error messages.
*	Add a lookup_builds field to the build_servers table	Christopher Baines	2020-05-24
\| \| \| \| \|	This is to allow for build servers where only the substitutes should be queried, and it shouldn't be assumed that they're running Cuirass.
*	Move around --no-tablespaces	Christopher Baines	2020-05-16
\| \| \| \| \| \|	Turns out, at the moment, this is ineffective when combined with the archive formats, like the custom format in use. Therefore, move it to the pg_restore command, where hopefully it'll work.
*	Don't include tablespace assignments in the backup dump	Christopher Baines	2020-05-14
\| \| \| \| \| \| \| \| \| \| \|	This is a comprimise, as this won't help restoring the backup in situations you want tablespaces, but I'm currently viewing tablespaces as a deployment concern, so maybe the right thing to do is exclude them. This approach will at least keep the same behaviour in terms of restoring the backups locally. This will fix the small dump creation process on data.guix.gnu.org, which is currently broken because of the tablespace assignments when trying to restore the backups.
*	Split out querying of build servers and substitute servers	Christopher Baines	2020-05-03
\| \| \| \| \| \|	These are related things, but somewhat separate. This change should make it easier to deal with changes regarding querying build servers, and querying substitute servers.
*	Set a statement timeout of 60 seconds for web requests	Christopher Baines	2020-04-24
\| \| \| \| \|	This will help stop queries running for an unnecessarily long time, longer than NGinx will wait for example.
*	Rebuild the package derivation ranges table for the small backup	Christopher Baines	2020-03-31
\| \| \| \| \| \|	This is better than just deleting the entries that don't match up with the remaining revisions, but also not very useful for local development (due to the lack of data).
*	Give the temporary database more working memory	Christopher Baines	2020-03-26
\| \| \| \|	In the hope that this makes the script faster.
*	Use EXPLAIN ANALYZE for the creation of tmp_derivations	Christopher Baines	2020-03-26
\| \| \| \| \|	In the create-small-backup script, as this is quite a slow part, it's useful to get more information.
*	Handle a couple more tables in create-small-backup	Christopher Baines	2020-03-26
\| \| \| \| \|	derivation_output_details_sets, and derivations_by_output_details_set. This required moving around some of the code.
*	Use the --no-comments option to pg_dump	Christopher Baines	2020-03-25
\| \| \| \| \| \| \| \| \| \|	Hopefully this will help with the pg_restore in the create-small-backup script: pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 2875; 0 0 COMMENT EXTENSION plpgsql pg_restore: [archiver (db)] could not execute query: ERROR: must be owner of extension plpgsql Command was: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
*	Handle channel instances in create-small-backup	Christopher Baines	2020-03-25
\| \| \| \|	Otherwise this table is empty.
*	Handle system test derivations in create-small-backup	Christopher Baines	2020-03-25
\| \| \| \|	Otherwise this table is empty.
*	Stop using package_versions_by_guix_revision_range	Christopher Baines	2020-03-24
\| \| \| \|	It's been replaced by the package_derivations_by_guix_revision_range table.
*	Avoid failures related to renice and ionice	Christopher Baines	2020-03-20
\| \| \| \| \|	These parts of the backup scripts are optional, so don't fail if they don't work.
*	Move and improve the "starting the server" message	Christopher Baines	2020-03-14
\| \| \| \|	Move it after the output relating to narinfo signing, and include the host.
*	Improve handling of errors	Christopher Baines	2020-03-14
\| \| \| \| \|	Adjust the previously unused error page code, and start to use it. Only show the error if configured to do so, to avoid leaking secret information.