plan for more efficient pagespec_match_list sorting

(smcv please note)
author: Joey Hess <joey@gnu.kitenet.net> 2010-04-11 00:30:27 -0400
committer: Joey Hess <joey@gnu.kitenet.net> 2010-04-11 00:30:27 -0400
commit: cecbd529389788c1f1cb647e2ff297cda7554456 (patch)
tree: 85bc00c88726d02839411db3d7d4e49d37898324 /doc/todo
parent: 1be32275837760c65ef96e40893fb8ba540bc939 (diff)
download: ikiwiki-cecbd529389788c1f1cb647e2ff297cda7554456.tar
ikiwiki-cecbd529389788c1f1cb647e2ff297cda7554456.tar.gz
1 files changed, 33 insertions, 0 deletions
diff --git a/doc/todo/smarter_sorting.mdwn b/doc/todo/smarter_sorting.mdwn
new file mode 100644
index 000000000..5a6d63ef1
--- /dev/null
+++ b/doc/todo/smarter_sorting.mdwn
@@ -0,0 +1,33 @@
+I benchmarked a build of a large wiki (my home wiki), and it was spending
+quite a lot of time sorting; `CORE::sort` was called only 1138 times, but
+still flagged as the #1 time sink. (I'm not sure I trust NYTProf fully
+about that FWIW, since it also said 27238263 calls to `cmp_age` were
+the #3 timesink, and I suspect it may not entirely accurately measure
+the overhead of so many short function calls.)
+
+`pagespec_match_list` currently always sorts *all* pages first, and then
+finds the top M that match the pagespec. That's innefficient when M is
+small (as for example in a typical blog, where only 20 posts are shown,
+out of maybe thousands).
+
+As [[smcv]] noted, It could be flipped, so the pagespec is applied first,
+and then sort the smaller matching set. But, checking pagespecs is likely
+more expensive than sorting. (Also, influence calculation complicates
+doing that, since only influences for the M returned pages should be tracked.)
+
+Another option, when there is a limit on M pages to return, might be to
+cull the M top pages without sorting the rest. This could be done using
+a variant of Ye Olde Bubble Sort. Take the first M pages, and (quick)sort.
+Then for each of the rest, check if it is higher than the Mth page.
+If it is, bubble it up so it's sorted.
+If not, throw it out (that's the fast bit and why this is not O(N^2)).
+
+This would be bad when M is very large, and particularly, of course, when
+there is no limit and all pages are being matched on. (For example, an
+archive page shows all pages that match a pagespec specifying a creation
+date range.) Well, in this case, it *does* make sense to flip it, limit by
+pagespe first, and do a (quick)sort second. (No influence complications,
+either.)
+
+Adding these special cases will be more complicated, but I think the best
+of both worlds. --[[Joey]]
author	Joey Hess <joey@gnu.kitenet.net>	2010-04-11 00:30:27 -0400
committer	Joey Hess <joey@gnu.kitenet.net>	2010-04-11 00:30:27 -0400
commit	cecbd529389788c1f1cb647e2ff297cda7554456 (patch)
tree	85bc00c88726d02839411db3d7d4e49d37898324 /doc/todo
parent	1be32275837760c65ef96e40893fb8ba540bc939 (diff)
download	ikiwiki-cecbd529389788c1f1cb647e2ff297cda7554456.tar ikiwiki-cecbd529389788c1f1cb647e2ff297cda7554456.tar.gz