guix/data-service

	Commit message (Collapse)	Author	Age
*	Store the distribution of derivations related to packages	Christopher Baines	2023-03-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be generally useful, but I've been looking at it as it offers a way to try and improve query performance when you want to select all the derivations related to the packages for a revision. The data looks like this (for a specified system and target): ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 15 │ 2 │ │ 14 │ 3 │ │ 13 │ 3 │ │ 12 │ 3 │ │ 11 │ 14 │ │ 10 │ 25 │ │ 9 │ 44 │ │ 8 │ 91 │ │ 7 │ 1084 │ │ 6 │ 311 │ │ 5 │ 432 │ │ 4 │ 515 │ │ 3 │ 548 │ │ 2 │ 2201 │ │ 1 │ 21162 │ │ 0 │ 22310 │ └───────┴───────┘ Level 0 reflects the number of packages. Level 1 is similar as you have all the derivations for the package origins. The remaining levels contain less packages since it's mostly just derivations involved in bootstrapping. When using a recursive CTE to collect all the derivations, PostgreSQL assumes that the each derivation has the same number of inputs, and this leads to a large overestimation of the number of derivations per a revision. This in turn can lead to PostgreSQL picking a slower way of running the query. When it's known how many new derivations you should see at each level, it's possible to inform PostgreSQL this by using LIMIT's at various points in the query. This reassures the query planner that it's not going to be handling lots of rows and helps it make better decisions about how to execute the query.
*	Add an extra index on blocked_builds	Christopher Baines	2023-03-05
\| \| \| \|	I think this might help with queries that don't use the build_server_id.
*	Add index on derivation_outputs id and derivation_id fields	Christopher Baines	2022-11-12
\| \| \| \|	This might help with doing recursive queries on the derivations graph.
*	Add extended statistics on package_derivations	Christopher Baines	2022-11-12
\| \| \| \|	This helps row count estimates when filtering on system_id and target.
*	Add support for incrementally tracking blocked builds	Christopher Baines	2022-11-11
\| \| \| \| \| \| \| \| \|	This will hopefully provide a less expensive way of finding out if a scheduled build is probably blocked by other builds failing or being canceled. By working this out when the build events are recieved, it should be more feasible to include information about whether builds are likely blocked or not in various places (e.g. revision comparisons).
*	Speed up finding the locales for a revision	Christopher Baines	2022-09-14
\|
*	Support per repository configuration for fetching substitutes	Christopher Baines	2022-09-09
\| \| \| \| \|	This is so that data.qa.guix.gnu.org can be configured only to query the branches from the main repository.
*	Add an index on git_commits	Christopher Baines	2022-06-17
\| \| \| \|	And also remove the duplicates that have crept in.
*	Add an index on package_derivations_by_guix_revision_range	Christopher Baines	2022-06-17
\| \| \| \|	This helps render the package version range related pages.
*	Partition the package_derivations_by_guix_revision_range table	Christopher Baines	2022-05-23
\| \| \| \| \| \| \| \| \| \| \| \|	And create a proper git_branches table in the process. I'm hoping this will help with slow deletions from the package_derivations_by_guix_revision_range table in the case where there are lots of branches, since it'll separate the data for one branch from another. These migrations will remove the existing data, so rebuild-package-derivations-table will currently need manually running to regenerate it.
*	Add an index on derivation_output_details.id	Christopher Baines	2021-07-11
\| \| \| \| \|	Where hash is not null, since this helps with the fixed output package derivations query.
*	Track package replacements	Christopher Baines	2021-07-11
\| \| \| \| \| \|	Start at least looking for package replacements, and storing the details (particularly the derivation). I'm looking at doing this so that build servers using the Guix Data Service can build these derivations.
*	Add index for package_metadata	Canan Talayhan	2021-05-27
\| \| \| \| \| \|	This index speeds up inserting new entries to the package_metadata table. Signed-off-by: Christopher Baines <mail@cbaines.net>
*	Add some indexes	Christopher Baines	2021-05-17
\| \| \| \|	This is from reading the slow query log on data.guix.gnu.org.
*	Create a table for systems	Christopher Baines	2021-04-23
\| \| \| \| \| \| \| \|	And use it for the systems in the derivations and package derivations tables. The primary motivation here is to allow quickly working out what systems the database contains, and having a small table with just the right data seems a good way to do that.
*	Store channel instance derivations in a separate transaction	Christopher Baines	2021-02-02
\| \| \| \| \| \|	This means that these derivations are stored, even if a later part of the process fails. Having the channel instance derivations stored might help work out why the failure occurred, or better display information about it.
*	Increase statistics targets for the derivation_inputs table	Christopher Baines	2020-12-27
\| \| \| \| \| \|	It has a lot of rows, and this seems to help the query planning for select-required-builds-that-failed, as I think it helps PostgreSQL better estimate the row counts that different parts of the query will produce.
*	Add index to speed up deleting revisions	Christopher Baines	2020-10-27
\| \| \| \|	This is necessary to quickly delete entries from package_derivaitons.
*	Regenerate the latest_build_status table	Christopher Baines	2020-10-21
\| \| \| \| \|	As it doesn't handle wierdness for statuses, like builds being canceled with a timestamp of 0.
*	Add a latest_build_status table	Christopher Baines	2020-10-13
\| \| \| \| \|	This will avoid many queries trying to figure out what the latest build status is, which will hopefuly simplify queries as well as improving performance.
*	Support storing whether to authenticate commits by git repository	Christopher Baines	2020-10-07
\| \| \| \|	To allow using the Guix Data Service when commits aren't authenticated.
*	Add an index for derivation_sources.derivation_source_file_id	Christopher Baines	2020-10-02
\| \| \| \|	As this speeds up deleting derivation_source_files.
*	Change a constraint to add ON DELETE CASCADE	Christopher Baines	2020-10-02
\| \| \| \| \| \|	I've not used these in many places, to try and avoid hiding deleting data, but in this case, this will allow more easily deleting the derivation source file nars, by just deleting the derivation_source_files table entry.
*	Change the autovacuum config for some tables	Christopher Baines	2020-10-01
\| \| \| \| \| \|	Looking at data for the the patches deployment of the Guix Data Service, these tables look like they might benefit from vacuuming/analyzing more often, so adjust the configuration so this will hopefully happen.
*	Change the locale codeset representation	Christopher Baines	2020-09-26
\| \| \| \| \| \| \| \| \| \| \|	From the normalized one, to the one actually contained within glibc. Recent versions of glibc also contain symlinks linking the normalized codeset to the locales with the .UTF-8 ending, but older ones do not. Maybe handling codeset normalisation for queries would be good, but the locale values ending in .UTF-8 are more compatible and allow the code to be simplified. For querying, maybe there should be a locales table which handles different representations.
*	Change a package_descriptions index to use MD5 of the description	Christopher Baines	2020-08-23
\| \| \| \| \| \| \| \| \| \|	To avoid the index data being too large. This was first seen in guix commit 1bb4fd64b7bbe5a17eda6f0ca8105283c038f7c8: psql-query-error (fatal-error PGRES_FATAL_ERROR ERROR: index row size 2808 exceeds maximum 2712 for index "package_descriptions_locale_description_key" HINT: Values larger than 1/3 of a buffer page cannot be indexed.
*	Add the package_metadata_tsvectors table	Danjela Lura	2020-08-12
\| \| \| \|	Signed-off-by: Christopher Baines <mail@cbaines.net>
*	Add a index to derivation_output_details_sets	Christopher Baines	2020-07-04
\| \| \| \| \| \| \| \|	This helps with queries like: SELECT * FROM derivation_output_details_sets WHERE ARRAY[5563227] <@ derivation_output_details_ids;
*	Remove the constraint on having one derivation build per server	Christopher Baines	2020-07-01
\| \| \| \|	As the Guix Build Coordinator can build one derivation many times.
*	Avoid duplicate build status entries	Christopher Baines	2020-07-01
\| \| \| \|	This effectively makes adding build status entries idempotent to some degree.
*	Support storing the id's representing builds in other services	Christopher Baines	2020-06-27
\| \| \| \| \| \| \|	Like Cuirass, or the Guix Data Service. This isn't so necessary with Cuirass, as the derivation/output are unique identifiers, but for the Guix Build Coordinator, one derivation/output can be built many times, with potentially differing results.
*	Make build_status.timestamp nullable	Christopher Baines	2020-06-27
\| \| \| \| \| \|	The Guix Build Coordinator doesn't record timestamps for events currently. This is something to fix, but allow for build statuses without timestamps as they're not necessary.
*	Get the translated package synopsis and descriptions into the database	Danjela Lura	2020-06-19
\| \| \| \|	Signed-off-by: Christopher Baines <mail@cbaines.net>
*	Get the translated lint checker descriptions into the database	Danjela Lura	2020-06-07
\| \| \| \|	Signed-off-by: Christopher Baines <mail@cbaines.net>
*	Make nar_urls file sizes optional	Christopher Baines	2020-06-03
\|
*	Add a lookup_builds field to the build_servers table	Christopher Baines	2020-05-24
\| \| \| \| \|	This is to allow for build servers where only the substitutes should be queried, and it shouldn't be assumed that they're running Cuirass.
*	Remove odd package derivations with mismatched systems	Christopher Baines	2020-04-24
\| \| \| \| \|	This follows on from 79993bfaeff3097d37838bc0636fc20c6fc4283d, disassociating the relevant derivations from the revisions.
*	Make jobs unique by commit	Christopher Baines	2020-03-28
\| \| \| \| \|	Given that a commit, regardless of what repository it comes from should contain the same exactly the same data, just track jobs by commit.
*	Create an index on narinfo_fetch_records	Christopher Baines	2020-03-25
\| \| \| \|	This greatly improves the performance of the derivation-outputs page.
*	Stop using package_versions_by_guix_revision_range	Christopher Baines	2020-03-24
\| \| \| \|	It's been replaced by the package_derivations_by_guix_revision_range table.
*	Generate and store system test derivations for all supported systems	Christopher Baines	2020-03-20
\| \| \| \| \| \| \|	Rather than just the native system. I'm not quite sure of the value here, as I guess system tests should behave the same regardless of the way the software is compiled, but this seems like it could be useful, and being explicit about the system the derivation is for is good.
*	Make a constraint deferrable	Christopher Baines	2020-02-16
\| \| \| \|	To allow for deferring it in a transaction when deleting derivations.
*	Update the build_servers_build_config values	Christopher Baines	2020-02-13
\| \| \| \| \|	Change target to '' if it matches system, as this matches the updated way of storing derivations.
*	Start storing channel instance derivations	Christopher Baines	2020-02-11
\| \| \| \|	These are the ones that relate to Guix pull.
*	Support excluding and including branches from a repository	Christopher Baines	2020-02-08
\| \| \| \| \|	So that you can have the Guix Data Service only pay attention to some of the branches.
*	Rework cross derivations support	Christopher Baines	2020-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop using the system values as targets, and remove package derivation entries where this is the case. Switch the non-cross derivation case to have a target of "", rather than matching the system, as this makes more sense, and is more consistent now that the target values no longer match the system values. Hardcode some more correct target values, and use these instead. Hopefully this can be better integrated with Guix in the future. This commit also includes a migration attempting to shrink some indexes.
*	Add missing migration	Christopher Baines	2020-02-03
\|
*	Don't hardcode the expected x-git-repo header value	Christopher Baines	2020-01-11
\| \| \| \| \|	Rather than expecting it always to be "guix", store the expected value in the database, and use the value of the header to find the relevant repository.
*	Add a table to configure which build servers build what	Christopher Baines	2020-01-05
\|
*	Create an index on the hash component of the store path	Christopher Baines	2019-12-29
\| \| \| \|	For derivation_source_files.