From 2a7721febd6cac1af5e7f4b4949ffe066c62c837 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 5 May 2009 23:40:09 -0400 Subject: Avoid %links accumulating duplicates. (For TOVA) This is sorta an optimisation, and sorta a bug fix. In one test case I have available, it can speed a page build up from 3 minutes to 3 seconds. The root of the problem is that $links{$page} contains arrays of links, rather than hashes of links. And when a link is found, it is just pushed onto the array, without checking for dups. Now, the array is emptied before scanning a page, so there should not be a lot of opportunity for lots of duplicate links to pile up in it. But, in some cases, they can, and if there are hundreds of duplicate links in the array, then scanning it for matching links, as match_link and some other code does, becomes much more expensive than it needs to be. Perhaps the real right fix would be to change the data structure to a hash. But, the list of links is never accessed like that, you always want to iterate through it. I also looked at deduping the list in saveindex, but that does a lot of unnecessary work, and doesn't completly solve the problem. So, finally, I decided to add an add_link function that handles deduping, and make ikiwiki-transition remove the old dup links. --- doc/ikiwiki-transition.mdwn | 7 +++++++ doc/plugins/write.mdwn | 22 ++++++++++++++-------- 2 files changed, 21 insertions(+), 8 deletions(-) (limited to 'doc') diff --git a/doc/ikiwiki-transition.mdwn b/doc/ikiwiki-transition.mdwn index 18836d5f5..e0b853ecf 100644 --- a/doc/ikiwiki-transition.mdwn +++ b/doc/ikiwiki-transition.mdwn @@ -61,6 +61,13 @@ If this is not done explicitly, a user's plaintext password will be automatically converted to a hash when a user logs in for the first time after upgrade to ikiwiki 2.48. +# deduplinks srcdir + +In the past, bugs in ikiwiki have allowed duplicate link information +to be stored in its indexdb. This mode removes such duplicate information, +which may speed up wikis afflicted by it. Note that rebuilding the wiki +will have the same effect. + # AUTHOR Josh Triplett , Joey Hess diff --git a/doc/plugins/write.mdwn b/doc/plugins/write.mdwn index 0b358b215..28da243d5 100644 --- a/doc/plugins/write.mdwn +++ b/doc/plugins/write.mdwn @@ -107,8 +107,8 @@ adding or removing files from it. This hook is called early in the process of building the wiki, and is used as a first pass scan of the page, to collect metadata about the page. It's -mostly used to scan the page for [[WikiLinks|ikiwiki/WikiLink]], and add them to `%links`. -Present in IkiWiki 2.40 and later. +mostly used to scan the page for [[WikiLinks|ikiwiki/WikiLink]], and add +them to `%links`. Present in IkiWiki 2.40 and later. The function is passed named parameters "page" and "content". Its return value is ignored. @@ -151,11 +151,11 @@ parameter is set to a true value if the page is being previewed. If `hook` is passed an optional "scan" parameter, set to a true value, this makes the hook be called during the preliminary scan that ikiwiki makes of updated pages, before begining to render pages. This should be done if the -hook modifies data in `%links`. Note that doing so will make the hook be -run twice per page build, so avoid doing it for expensive hooks. (As an -optimisation, if your preprocessor hook is called in a void context, you -can assume it's being run in scan mode, and avoid doing expensive things at -that point.) +hook modifies data in `%links` (typically by calling `add_link`). Note that +doing so will make the hook be run twice per page build, so avoid doing it +for expensive hooks. (As an optimisation, if your preprocessor hook is +called in a void context, you can assume it's being run in scan mode, and +avoid doing expensive things at that point.) Note that if the [[htmlscrubber]] is enabled, html in preprocessor [[ikiwiki/directive]] output is sanitised, which may limit what @@ -174,7 +174,8 @@ links. The function is passed named parameters "page", "destpage", and and later. Plugins that implement linkify must also implement a scan hook, that scans -for the links on the page and adds them to `%links`. +for the links on the page and adds them to `%links` (typically by calling +`add_link`). ### htmlize @@ -753,6 +754,11 @@ Optionally, a third parameter can be passed, to specify the preferred filename of the page. For example, `targetpage("foo", "rss", "feed")` will yield something like `foo/feed.rss`. +#### `add_link($$)` + +This adds a link to `%links`, ensuring that duplicate links are not +added. Pass it the page that contains the link, and the link text. + ## Miscellaneous ### Internal use pages -- cgit v1.2.3