guix/data-service

	Commit message (Collapse)	Author	Age
*	Add placeholder derivation source file nar procedures	Christopher Baines	2024-10-27
\|
*	Speed up package-description-and-synopsis-locale-options-guix-revision	Christopher Baines	2024-09-08
\|
*	Speed up select-builds-with-context-by-derivation-output	Christopher Baines	2024-09-07
\|
*	Use system-ids for inserting distribution counts	Christopher Baines	2024-08-12
\|
*	Add more logging to backfilling ↵	Christopher Baines	2024-08-12
\| \| \| \|	guix_revision_package_derivation_distribution_counts
*	Parallelise inserting package derivation distribution counts	Christopher Baines	2024-08-10
\|
*	Move inserting derivations in to the load-new-guix-revision module	Christopher Baines	2024-08-07
\| \| \| \| \|	And start to more closely integrate it. This makes it possible to start making it faster by doing more in parallel.
*	Add more time logging in to insert-missing-derivations	Christopher Baines	2024-07-16
\|
*	Stop inserting missing source file nars	Christopher Baines	2024-07-16
\| \| \| \| \|	This was more an issue several years ago, so this code is not really needed now.
*	Delete duplicates when inserting license data	Christopher Baines	2024-06-25
\| \| \| \|	As I think this is necessary.
*	Speed up querying for revision package derivations	Christopher Baines	2024-06-21
\| \| \| \|	By splitting it up by system.
*	Speed up select-build-outputs	Christopher Baines	2024-06-20
\|
*	Support regexes for included and excluded branches	Christopher Baines	2024-05-22
\|
*	Fix package replacement handling on the revision packages page	Christopher Baines	2024-04-28
\|
*	Move backfilling in to the server module and use the connection pool	Christopher Baines	2024-04-01
\| \| \| \|	To avoid using the old PostgreSQL connection per thread code.
*	Speed up loading package metadata	Christopher Baines	2024-02-01
\| \| \| \|	By batching the SQL queries.
*	Split up handling of package description data	Christopher Baines	2024-01-31
\| \| \| \|	To hopefully see which part is slow.
*	Remove even more time logging	Christopher Baines	2024-01-28
\|
*	Remove some time logging	Christopher Baines	2024-01-27
\| \| \| \|	As this is a bit noisy.
*	Fixup tests	Christopher Baines	2024-01-18
\|
*	Split and instrument parts of inferior-packages->package-metadata-ids	Christopher Baines	2024-01-18
\| \| \| \|	As parts of it are slow.
*	Rewrite part of insert-missing-data-and-return-all-ids to avoid filter	Christopher Baines	2024-01-18
\| \| \| \| \|	As filter can use part of the input list, which then prevents modifying the filtered list.
*	Have delete-duplicates/sort! take a equality procedure	Christopher Baines	2024-01-18
\| \| \| \|	And change the default, as eq? doesn't always work.
*	Use delete-duplicates/sort! in inferior-packages->license-set-ids	Christopher Baines	2024-01-18
\| \| \| \|	As it should offer a speedup over delete-duplicates.
*	Use delete-duplicates/sort! in insert-missing-data-and-return-all-ids	Christopher Baines	2024-01-18
\| \| \| \|	As it's faster than delete-duplicates for large amounts of data.
*	Memoize computing tokens	Christopher Baines	2023-11-24
\| \| \| \| \|	As I'm not sure how expensive this is, but it doesn't need doing for every request.
*	Handle derivations with no sources	Christopher Baines	2023-11-05
\|
*	Include output information in the package page response	Christopher Baines	2023-11-05
\| \| \| \| \|	As this will be useful for QA to say whether the package builds reproducibly or not.
*	Use fibers when processing new revisions	Christopher Baines	2023-11-05
\| \| \| \| \| \| \| \|	Just have one fiber at the moment, but this will enable using fibers for parallelism in the future. Fibers seemed to cause problems with the logging setup, which was a bit odd in the first place. So move logging to the parent process which is better anyway.
*	Make some sweeping changes to loading new revisions	Christopher Baines	2023-11-02
\| \| \| \| \|	Move in the direction of being able to run multiple inferior REPLs, and use some vectors rather than lists in places (maybe this is more efficient).
*	Remove redundant joins from the select build query	Christopher Baines	2023-10-16
\|
*	Support polling git repositories for new branches/revisions	Christopher Baines	2023-10-09
\| \| \| \| \| \| \| \| \| \| \|	This is mostly a workaround for the occasional problems with the guix-commits mailing list, as it can break and then the data service doesn't learn about new revisions until the problem is fixed. I think it's still a generally good feature though, and allows deploying the data service without it consuming emails to learn about new revisions, and is a step towards integrating some kind of way of notifying the data service to poll.
*	Try to fix backfilling blocked_builds	Christopher Baines	2023-07-02
\|
*	Filter out duplicate ids for blocking builds	Christopher Baines	2023-07-02
\|
*	Query for outputs when build events arrive	Christopher Baines	2023-06-09
\| \| \| \|	This will keep the substitute information more up to date.
*	Fix ignoring canceled builds	Christopher Baines	2023-05-18
\| \| \| \| \|	The previous changes only affected searching for package derivations, and they also didn't work.
*	Ignore canceled builds when querying package derivations	Christopher Baines	2023-05-18
\| \| \| \| \|	This will help when using this to submit builds, since you won't end up ignoring derivations with canceled builds.
*	Ensure the known and unknown keys appear	Christopher Baines	2023-05-09
\|
*	Remove redundant match-lambda in select-package-output-availability-for-revision	Christopher Baines	2023-05-09
\|
*	Use the package_derivations system id in a query	Christopher Baines	2023-05-04
\| \| \| \| \|	Rather than the derivations system id, as this helps PostgreSQL run the query faster.
*	Further tweak fetching narinfos	Christopher Baines	2023-04-28
\| \| \| \| \|	Move the batching to the database, which should reduce memory usage while removing the limit on the number of fetched narinfos.
*	Improve performance of select-fixed-output-package-derivations-in-revision	Christopher Baines	2023-03-11
\|
*	Fix query in get-count-for-next-level	Christopher Baines	2023-03-09
\|
*	Avoid a recursive CTE for finding blocked builds where possible	Christopher Baines	2023-03-09
\| \| \| \| \| \|	Use the new approach of looking up the distribution of the derivations, and building a non recursive query specifically for this revision. This should avoid PostgreSQL picking a poor plan for performing the query.
*	Store the distribution of derivations related to packages	Christopher Baines	2023-03-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query.
*	Guard against divide by 0 in update-derivation-outputs-statistics	Christopher Baines	2022-11-28
\|
*	Do derivation inputs and outputs housekeeping at the end of each job	Christopher Baines	2022-11-28
\| \| \| \| \| \|	This should help with query performance, as the recursive queries using derivation_inputs and derivation_outputs are particularly sensitive to the n_distinct values for these tables.
*	Fix calling insert-blocked-builds	Christopher Baines	2022-11-20
\|
*	Make backfilling blocked_builds a bit smarter	Christopher Baines	2022-11-12
\| \| \| \|	And drop the chunk size.
*	Handle deleting from blocked_builds when builds are scheduled	Christopher Baines	2022-11-12
\| \| \| \|	As scheduling a build might unblock others.