guix/data-service

	Commit message (Collapse)	Author	Age
*	Split up handling of package description data	Christopher Baines	2024-01-31
\| \| \| \|	To hopefully see which part is slow.
*	Remove even more time logging	Christopher Baines	2024-01-28
\|
*	Remove some time logging	Christopher Baines	2024-01-27
\| \| \| \|	As this is a bit noisy.
*	Fixup tests	Christopher Baines	2024-01-18
\|
*	Split and instrument parts of inferior-packages->package-metadata-ids	Christopher Baines	2024-01-18
\| \| \| \|	As parts of it are slow.
*	Rewrite part of insert-missing-data-and-return-all-ids to avoid filter	Christopher Baines	2024-01-18
\| \| \| \| \|	As filter can use part of the input list, which then prevents modifying the filtered list.
*	Have delete-duplicates/sort! take a equality procedure	Christopher Baines	2024-01-18
\| \| \| \|	And change the default, as eq? doesn't always work.
*	Use delete-duplicates/sort! in inferior-packages->license-set-ids	Christopher Baines	2024-01-18
\| \| \| \|	As it should offer a speedup over delete-duplicates.
*	Use delete-duplicates/sort! in insert-missing-data-and-return-all-ids	Christopher Baines	2024-01-18
\| \| \| \|	As it's faster than delete-duplicates for large amounts of data.
*	Memoize computing tokens	Christopher Baines	2023-11-24
\| \| \| \| \|	As I'm not sure how expensive this is, but it doesn't need doing for every request.
*	Handle derivations with no sources	Christopher Baines	2023-11-05
\|
*	Include output information in the package page response	Christopher Baines	2023-11-05
\| \| \| \| \|	As this will be useful for QA to say whether the package builds reproducibly or not.
*	Use fibers when processing new revisions	Christopher Baines	2023-11-05
\| \| \| \| \| \| \| \|	Just have one fiber at the moment, but this will enable using fibers for parallelism in the future. Fibers seemed to cause problems with the logging setup, which was a bit odd in the first place. So move logging to the parent process which is better anyway.
*	Make some sweeping changes to loading new revisions	Christopher Baines	2023-11-02
\| \| \| \| \|	Move in the direction of being able to run multiple inferior REPLs, and use some vectors rather than lists in places (maybe this is more efficient).
*	Remove redundant joins from the select build query	Christopher Baines	2023-10-16
\|
*	Support polling git repositories for new branches/revisions	Christopher Baines	2023-10-09
\| \| \| \| \| \| \| \| \| \| \|	This is mostly a workaround for the occasional problems with the guix-commits mailing list, as it can break and then the data service doesn't learn about new revisions until the problem is fixed. I think it's still a generally good feature though, and allows deploying the data service without it consuming emails to learn about new revisions, and is a step towards integrating some kind of way of notifying the data service to poll.
*	Try to fix backfilling blocked_builds	Christopher Baines	2023-07-02
\|
*	Filter out duplicate ids for blocking builds	Christopher Baines	2023-07-02
\|
*	Query for outputs when build events arrive	Christopher Baines	2023-06-09
\| \| \| \|	This will keep the substitute information more up to date.
*	Fix ignoring canceled builds	Christopher Baines	2023-05-18
\| \| \| \| \|	The previous changes only affected searching for package derivations, and they also didn't work.
*	Ignore canceled builds when querying package derivations	Christopher Baines	2023-05-18
\| \| \| \| \|	This will help when using this to submit builds, since you won't end up ignoring derivations with canceled builds.
*	Ensure the known and unknown keys appear	Christopher Baines	2023-05-09
\|
*	Remove redundant match-lambda in select-package-output-availability-for-revision	Christopher Baines	2023-05-09
\|
*	Use the package_derivations system id in a query	Christopher Baines	2023-05-04
\| \| \| \| \|	Rather than the derivations system id, as this helps PostgreSQL run the query faster.
*	Further tweak fetching narinfos	Christopher Baines	2023-04-28
\| \| \| \| \|	Move the batching to the database, which should reduce memory usage while removing the limit on the number of fetched narinfos.
*	Improve performance of select-fixed-output-package-derivations-in-revision	Christopher Baines	2023-03-11
\|
*	Fix query in get-count-for-next-level	Christopher Baines	2023-03-09
\|
*	Avoid a recursive CTE for finding blocked builds where possible	Christopher Baines	2023-03-09
\| \| \| \| \| \|	Use the new approach of looking up the distribution of the derivations, and building a non recursive query specifically for this revision. This should avoid PostgreSQL picking a poor plan for performing the query.
*	Store the distribution of derivations related to packages	Christopher Baines	2023-03-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query.
*	Guard against divide by 0 in update-derivation-outputs-statistics	Christopher Baines	2022-11-28
\|
*	Do derivation inputs and outputs housekeeping at the end of each job	Christopher Baines	2022-11-28
\| \| \| \| \| \|	This should help with query performance, as the recursive queries using derivation_inputs and derivation_outputs are particularly sensitive to the n_distinct values for these tables.
*	Fix calling insert-blocked-builds	Christopher Baines	2022-11-20
\|
*	Make backfilling blocked_builds a bit smarter	Christopher Baines	2022-11-12
\| \| \| \|	And drop the chunk size.
*	Handle deleting from blocked_builds when builds are scheduled	Christopher Baines	2022-11-12
\| \| \| \|	As scheduling a build might unblock others.
*	View scheduled builds like succeeded builds in terms of blocking	Christopher Baines	2022-11-12
\| \| \| \| \| \| \| \|	This means that an output is viewed to not be blocking if it has a scheduled build, just as if it has a succeeded build. Also, scheduling builds will unblock blocked builds. This is helpful as it means that it reduces noise for blocking builds.
*	Tweak backfilling the blocked builds	Christopher Baines	2022-11-12
\|
*	Use latest_build_status rather than build_status	Christopher Baines	2022-11-12
\| \| \| \|	In various places in the blocked-builds module.
*	Have insert-blocked-builds cache when the partitions exist	Christopher Baines	2022-11-11
\| \| \| \|	To make it more efficient.
*	Rework insert-blocked-builds to make it more efficient	Christopher Baines	2022-11-11
\| \| \| \|	This also fixes a typo in the partition name.
*	Stop using exception handling when inserting blocked_builds entries	Christopher Baines	2022-11-11
\| \| \| \|	As it doesn't work in a transaction.
*	Add a blocking builds page	Christopher Baines	2022-11-11
\|
*	Add support for incrementally tracking blocked builds	Christopher Baines	2022-11-11
\| \| \| \| \| \| \| \| \|	This will hopefully provide a less expensive way of finding out if a scheduled build is probably blocked by other builds failing or being canceled. By working this out when the build events are recieved, it should be more feasible to include information about whether builds are likely blocked or not in various places (e.g. revision comparisons).
*	Improve chunking when inserting derivation inputs	Christopher Baines	2022-09-17
\| \| \| \| \|	Chunk the values inserted in the query, rather than the derivations involved, as this is more consistent.
*	Reduce some chunk sizes	Christopher Baines	2022-09-17
\|
*	Further reduce some chunk sizes	Christopher Baines	2022-09-15
\|
*	Chunk the data for some queries in insert-missing-data-and-return-all-ids	Christopher Baines	2022-09-15
\| \| \| \| \|	This helps to avoid queries getting logged as slow just because of the amount of data.
*	Format some queries generated in insert-missing-data-and-return-all-ids	Christopher Baines	2022-09-14
\|
*	Reduce some chunk sizes	Christopher Baines	2022-09-14
\| \| \| \|	As these queries are still slow enough to be logged.
*	Speed up finding the locales for a revision	Christopher Baines	2022-09-14
\|
*	Reduce chunk size for inserting dervation inputs	Christopher Baines	2022-09-14
\| \| \| \|	As this query can take some time.