| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
guix_revision_package_derivation_distribution_counts
|
| |
|
|
|
|
|
| |
And start to more closely integrate it. This makes it possible to start making
it faster by doing more in parallel.
|
| |
|
|
|
|
|
| |
This was more an issue several years ago, so this code is not really needed
now.
|
|
|
|
| |
As I think this is necessary.
|
|
|
|
| |
By splitting it up by system.
|
| |
|
| |
|
| |
|
|
|
|
| |
To avoid using the old PostgreSQL connection per thread code.
|
|
|
|
| |
By batching the SQL queries.
|
|
|
|
| |
To hopefully see which part is slow.
|
| |
|
|
|
|
| |
As this is a bit noisy.
|
| |
|
|
|
|
| |
As parts of it are slow.
|
|
|
|
|
| |
As filter can use part of the input list, which then prevents modifying the
filtered list.
|
|
|
|
| |
And change the default, as eq? doesn't always work.
|
|
|
|
| |
As it should offer a speedup over delete-duplicates.
|
|
|
|
| |
As it's faster than delete-duplicates for large amounts of data.
|
|
|
|
|
| |
As I'm not sure how expensive this is, but it doesn't need doing for every
request.
|
| |
|
|
|
|
|
| |
As this will be useful for QA to say whether the package builds reproducibly
or not.
|
|
|
|
|
|
|
|
| |
Just have one fiber at the moment, but this will enable using fibers for
parallelism in the future.
Fibers seemed to cause problems with the logging setup, which was a bit odd in
the first place. So move logging to the parent process which is better anyway.
|
|
|
|
|
| |
Move in the direction of being able to run multiple inferior REPLs, and use
some vectors rather than lists in places (maybe this is more efficient).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a workaround for the occasional problems with the guix-commits
mailing list, as it can break and then the data service doesn't learn about
new revisions until the problem is fixed.
I think it's still a generally good feature though, and allows deploying the
data service without it consuming emails to learn about new revisions, and is
a step towards integrating some kind of way of notifying the data service to
poll.
|
| |
|
| |
|
|
|
|
| |
This will keep the substitute information more up to date.
|
|
|
|
|
| |
The previous changes only affected searching for package derivations, and they
also didn't work.
|
|
|
|
|
| |
This will help when using this to submit builds, since you won't end up
ignoring derivations with canceled builds.
|
| |
|
| |
|
|
|
|
|
| |
Rather than the derivations system id, as this helps PostgreSQL run the query
faster.
|
|
|
|
|
| |
Move the batching to the database, which should reduce memory usage while
removing the limit on the number of fetched narinfos.
|
| |
|
| |
|
|
|
|
|
|
| |
Use the new approach of looking up the distribution of the derivations, and
building a non recursive query specifically for this revision. This should
avoid PostgreSQL picking a poor plan for performing the query.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This might be generally useful, but I've been looking at it as it offers a way
to try and improve query performance when you want to select all the
derivations related to the packages for a revision.
The data looks like this (for a specified system and target):
┌───────┬───────┐
│ level │ count │
├───────┼───────┤
│ 15 │ 2 │
│ 14 │ 3 │
│ 13 │ 3 │
│ 12 │ 3 │
│ 11 │ 14 │
│ 10 │ 25 │
│ 9 │ 44 │
│ 8 │ 91 │
│ 7 │ 1084 │
│ 6 │ 311 │
│ 5 │ 432 │
│ 4 │ 515 │
│ 3 │ 548 │
│ 2 │ 2201 │
│ 1 │ 21162 │
│ 0 │ 22310 │
└───────┴───────┘
Level 0 reflects the number of packages. Level 1 is similar as you have all
the derivations for the package origins. The remaining levels contain less
packages since it's mostly just derivations involved in bootstrapping.
When using a recursive CTE to collect all the derivations, PostgreSQL assumes
that the each derivation has the same number of inputs, and this leads to a
large overestimation of the number of derivations per a revision. This in turn
can lead to PostgreSQL picking a slower way of running the query.
When it's known how many new derivations you should see at each level, it's
possible to inform PostgreSQL this by using LIMIT's at various points in the
query. This reassures the query planner that it's not going to be handling
lots of rows and helps it make better decisions about how to execute the
query.
|
| |
|
|
|
|
|
|
| |
This should help with query performance, as the recursive queries using
derivation_inputs and derivation_outputs are particularly sensitive to the
n_distinct values for these tables.
|
| |
|
|
|
|
| |
And drop the chunk size.
|
|
|
|
| |
As scheduling a build might unblock others.
|