aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/Improving_the_efficiency_of_match__95__glob.mdwn
blob: 43571ead77346a1c5528360eec0e369958f2de7c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
I've been profiling my IkiWiki to try to improve speed (with many pages makes speed even more important) and I've written a patch to improve the speed of match_glob.  This matcher is a good one to improve the speed of, because it gets called so many times.

Here's my patch - please consider it! -- [[KathrynAndersen]]

> It seems to me as though changing `glob2re` to return qr/$re/, and calling
> `memoize(glob2re)` next to the other memoize calls, would be a less
> verbose way to do this? --[[smcv]]

>> I think so, yeah. Anyway, do you have any benchmark results handy,
>> Kathryn?  --[[Joey]] 

>>> See below.
>>> Also, would it make more sense for glob2re to return qr/^$re$/i rather than qr/$re/?  Everything that uses glob2re seems to use
	$foo =~ /^$re$/i
>>> rather than /$re/ so I think that would make sense.
>>> -- [[KathrynAndersen]]

>>>> Git branch `smcv/ka-glob-cache` has Kathryn's patch. Git
>>>> branch `smcv/memoize-glob2re` does as I suggested, which
>>>> is less verbose than Kathryn's patch but also not as
>>>> fast; I'm not sure why, tbh. --[[smcv]]

>>>>> I think it's because my patch focuses on match_glob while the memoize patch focuses on `glob2re`, and `glob2re` is called in `filecheck`, `meta` and `po` as well as in `match_glob` and `match_user`; thus the memoized `glob2re` is dealing with a bigger set of globs to look up, and thus could be just that little bit slower. -- [[KathrynAndersen]]

--------------------------------------------------------------
Benchmarks done with Devel::Profile on the same testbed IkiWiki setup.  I'm just showing the start of the profile output, since that's what's relevant.

Before:
<pre>
time elapsed (wall):   27.4173
time running program:  22.5909  (82.40%)
time profiling (est.): 4.8264  (17.60%)
number of calls:       1314729
number of exceptions:  65

%Time    Sec.     #calls   sec/call  F  name
11.05    2.4969    62333   0.000040     IkiWiki::PageSpec::match_glob
 4.10    0.9261      679   0.001364     Text::Balanced::_match_tagged
 2.72    0.6139    59812   0.000010     IkiWiki::SuccessReason::merge_influences
</pre>

After:
<pre>
time elapsed (wall):   26.1843
time running program:  21.5673  (82.37%)
time profiling (est.): 4.6170  (17.63%)
number of calls:       1252433
number of exceptions:  65

%Time    Sec.     #calls   sec/call  F  name
 7.66    1.6521    62333   0.000027     IkiWiki::PageSpec::match_glob
 4.33    0.9336      679   0.001375     Text::Balanced::_match_tagged
 2.81    0.6057    59812   0.000010     IkiWiki::SuccessReason::merge_influences
</pre>

Note that the seconds per call for match_glob in the "after" case has gone down by about a third.

K.A.

--------------------------------------------------------------

A second set of benchmarks, done by rebuilding the docwiki at commit f942c2db05e4
like so:

    perl -Iblib/lib -d:Profile ikiwiki.in -setup docwiki.setup --no-verbose

The docwiki appears to use fewer glob matches than Kathryn's wiki.

With master:

    time elapsed (wall):   29.6970
    time running program:  24.6930  (83.15%)
    time profiling (est.): 5.0041  (16.85%)
    number of calls:       1359180
    number of exceptions:  13

    %Time    Sec.     #calls   sec/call  F  name
    13.62    3.3629     3406   0.000987     Text::Balanced::_match_tagged
    10.84    2.6773    79442   0.000034     IkiWiki::PageSpec::match_glob
     3.08    0.7598    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
     3.07    0.7593    29830   0.000025     IkiWiki::bestlink
     2.99    0.7378    10231   0.000072     IkiWiki::PageSpec::match_link

With my `smcv/memoize-glob2re` branch:

    time elapsed (wall):   30.4931
    time running program:  25.1248  (82.39%)
    time profiling (est.): 5.3683  (17.61%)
    number of calls:       1439943
    number of exceptions:  13

    %Time    Sec.     #calls   sec/call  F  name
    13.19    3.3146     3406   0.000973     Text::Balanced::_match_tagged
     8.41    2.1123    79442   0.000027     IkiWiki::PageSpec::match_glob
     3.97    0.9979    86905   0.000011     Memoize::_memoizer
     3.05    0.7654    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
     3.02    0.7576    29830   0.000025     IkiWiki::bestlink

and in a repeated run:

     8.40    2.0905    79442   0.000026     IkiWiki::PageSpec::match_glob

With Kathryn's patch as seen in my `smcv/ka-glob-cache` branch:

    time elapsed (wall):   27.7567
    time running program:  22.9941  (82.84%)
    time profiling (est.): 4.7627  (17.16%)
    number of calls:       1279946
    number of exceptions:  13

    %Time    Sec.     #calls   sec/call  F  name
    14.29    3.2867     3406   0.000965     Text::Balanced::_match_tagged
     7.89    1.8136    79442   0.000023     IkiWiki::PageSpec::match_glob
     3.30    0.7577    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
     3.24    0.7461    29830   0.000025     IkiWiki::bestlink
     3.19    0.7332      143   0.005127  ?  IkiWiki::pagespec_match_list

and in a repeated run:

     7.84    1.8253    79442   0.000023     IkiWiki::PageSpec::match_glob

--[[smcv]]

--------------------------------------------------------------


<pre>
diff --git a/IkiWiki.pm b/IkiWiki.pm
index 08a3d78..c187b98 100644
--- a/IkiWiki.pm
+++ b/IkiWiki.pm
@@ -2482,6 +2482,8 @@ sub derel ($$) {
 	return $path;
 }
 
+my %glob_cache;
+
 sub match_glob ($$;@) {
 	my $page=shift;
 	my $glob=shift;
@@ -2489,8 +2491,15 @@ sub match_glob ($$;@) {
 	
 	$glob=derel($glob, $params{location});
 
-	my $regexp=IkiWiki::glob2re($glob);
-	if ($page=~/^$regexp$/i) {
+	# Instead of converting the glob to a regex every time,
+	# cache the compiled regex to save time.
+	if (!exists $glob_cache{$glob}
+	    or !defined $glob_cache{$glob})
+	{
+	    my $re=IkiWiki::glob2re($glob);
+	    $glob_cache{$glob} = qr/^$re$/i;
+	}
+	if ($page =~ $glob_cache{$glob}) {
 		if (! IkiWiki::isinternal($page) || $params{internal}) {
 			return IkiWiki::SuccessReason->new("$glob matches $page");
 		}
</pre>
--------------------------------------------------------------