aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/format_escape.mdwn
blob: 9d9942f208f26695ce0436a504c5d25324e15044 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
Since some preprocessor directives insert raw HTML, it would be good to 
specify, per-format, how to pass HTML so that it goes through the format 
OK. With Markdown we cross our fingers; with reST we use the "raw" 
directive.

I added an extra named parameter to the htmlize hook, which feels sort of
wrong, since none of the other hooks take parameters. Let me know what 
you think. --Ethan

Seems fairly reasonable, actually. Shouldn't the `$type` come from `$page`
instead of `$destpage` though? Only other obvious change is to make the
escape parameter optional, and only call it if set. --[[Joey]]

> I couldn't figure out what to make it from, but thinking it through, 
> yeah, it should be $page. Revised patch follows. --Ethan

>> I've updated the patch some more, but I think it's incomplete. ikiwiki
>> emits raw html when expanding WikiLinks too, and it would need to escape
>> those. Assuming that escaping html embedded in the middle of a sentence
>> works.. --[[Joey]]

>>> Revised again. I get around this by making another hook, htmlescapelink,
>>> which is called to generate links in whatever language. In addition, it 
>>> doesn't (can't?) generate
>>> spans, and it doesn't handle inlineable image links. If these were 
>>> desired, the approach to take would probably be to use substitution
>>> definitions, which would require generating two bits of code for each
>>> link/html snippet, and putting one at the end of the paragraph (or maybe
>>> the document?).
>>> To specify that (for example) Discussion links are meant to be HTML and
>>> not rst or whatever, I added a "genhtml" parameter to htmllink. It seems
>>> to work -- see <http://ikidev.betacantrips.com/blah.html> for an example.
>>> --Ethan

## Alternative solution

[Here](http://www.jk.fr.eu.org/ikiwiki/format-escapes-2.diff) is a patch
largely inspired from the one below, which is up to date and written with
[[todo/multiple_output_formats]] in mind. "htmlize" hooks are generalized
to "convert" ones, which can be registered for any pair of filename
extensions.

Preprocessor directives are allowed to return the content to be inserted
as a hash, in any format they want, provided they provide htmlize hooks for it.
Pseudo filename extensions (such as `"_link"`) can also be introduced,
which aren't used as real extensions but provide useful intermediate types.

--[[JeremieKoenig]]

> Wow, this is in many ways a beautiful patch. I did notice one problem,
> if a link is converted to rst and then from there to a hyperlink, the
> styling info usially added to such a link is lost. I wonder if it would
> be better to lose _link stuff and just create link html that is fed into
> the rst,html converter. Other advantage to doing that is that link
> creation has a rather complex interface, with selflink, attrs, url, and
> content parameters.
> 
> --[[Joey]]

>> Thanks for the compliment. I must confess that I'm not too familiar with
>> rst. I am using this todo item somewhat as a pretext to get the conversion
>> stuff in, which I need to implement some other stuff. As a result I was
>> less careful with the rst plugin than with the rest of the patch.

>> This being said, as I understand it rst cannot embed raw html in
>> the middle of a paragraph. I just found with more tests that even
>> links are a bit tricky, and won't work if they're not surrounded by
>> whitespace; the problem is that if we add this space, links
>> and preprocessor directives at the beginning of a line will be indented,
>> and this means something to rst. Also, rst complains about "?"
>> being used multiple times when the page contains more than one broken link,
>> apparently it uses it as a name for the reference as well as the link text.

>> The idea behind _link and other "intermediate
>> forms" was also that, when we can use rst's ability to target other output
>> formats, raw html won't be included in this process, and that
>> complications will happen with all markup languages if html continues
>> to be used as the language for preprocessor directive output.
>> Of course this could have been postponed until we actually need it,
>> but since we do... :-)

>> I think I will document the limitations, and tune the bugs of the
>> rst plugin code to do the most sensible thing after some more reading
>> of the rst docs. Expect an updated patch in the next few days, and feel
>> free to ask for other adjustments in the meantime.

>> Beyond being buggy in the least horrible way, I'm afraid I won't have
>> much time for ikiwiki in the next two or three weeks (exams),
>> but I think that ultimately these limitations could be worked around.
>> I'm not sure it is desirable for ikiwiki to know too much about the
>> syntax of its markup languages. Maybe the tricky "format" stuff
>> the toc plugin does could be used; maybe we need to think about more
>> generic ways to put "marks" in the various types of pages, which could
>> be expanded afer htmlization, and maybe the convert stuff could be used
>> to do this in an elegant way;
>> but then this is not very [[multiple_output_formats]] friendly either.
>> What do you think?

>> --[[JeremieKoenig]]

## Original patch
[[tag patch]]

<pre>
Index: debian/changelog
===================================================================
--- debian/changelog	(revision 3197)
+++ debian/changelog	(working copy)
@@ -24,6 +24,9 @@
     than just a suggests, since OpenID is enabled by default.
   * Fix a bug that caused link(foo) to succeed if page foo did not exist.
   * Fix tags to page names that contain special characters.
+  * Based on a patch by Ethan, add a new htmlescape hook, that is called
+    when a preprocssor directive emits inline html. The rst plugin uses this
+    hook to support inlined raw html.
 
   [ Josh Triplett ]
   * Use pngcrush and optipng on all PNG files.
Index: IkiWiki/Render.pm
===================================================================
--- IkiWiki/Render.pm	(revision 3197)
+++ IkiWiki/Render.pm	(working copy)
@@ -96,7 +96,7 @@
 		if ($page !~ /.*\/\Q$discussionlink\E$/ &&
 		   (length $config{cgiurl} ||
 		    exists $links{$page."/".$discussionlink})) {
-			$template->param(discussionlink => htmllink($page, $page, gettext("Discussion"), noimageinline => 1, forcesubpage => 1));
+			$template->param(discussionlink => htmllink($page, $page, gettext("Discussion"), noimageinline => 1, forcesubpage => 1, genhtml => 1));
 			$actions++;
 		}
 	}
Index: IkiWiki/Plugin/rst.pm
===================================================================
--- IkiWiki/Plugin/rst.pm	(revision 3197)
+++ IkiWiki/Plugin/rst.pm	(working copy)
@@ -30,15 +30,36 @@
 html = publish_string(stdin.read(), writer_name='html', 
        settings_overrides = { 'halt_level': 6, 
                               'file_insertion_enabled': 0,
-                              'raw_enabled': 0 }
+                              'raw_enabled': 1 }
 );
 print html[html.find('<body>')+6:html.find('</body>')].strip();
 ";
 
 sub import { #{{{
 	hook(type => "htmlize", id => "rst", call => \&htmlize);
+	hook(type => "htmlescape", id => "rst", call => \&htmlescape);
+	hook(type => "htmlescapelink", id => "rst", call => \&htmlescapelink);
 } # }}}
 
+sub htmlescapelink ($$;@) { #{{{
+	my $url = shift;
+	my $text = shift;
+	my %params = @_;
+
+	if ($params{broken}){
+		return "`? <$url>`_\ $text";
+	}
+	else {
+		return "`$text <$url>`_";
+	}
+} # }}}
+
+sub htmlescape ($) { #{{{
+	my $html=shift;
+	$html=~s/^/  /mg;
+	return ".. raw:: html\n\n".$html;
+} # }}}
+
 sub htmlize (@) { #{{{
 	my %params=@_;
 	my $content=$params{content};
Index: doc/plugins/write.mdwn
===================================================================
--- doc/plugins/write.mdwn	(revision 3197)
+++ doc/plugins/write.mdwn	(working copy)
@@ -121,6 +121,26 @@
 The function is passed named parameters: "page" and "content" and should
 return the htmlized content.
 
+### htmlescape
+
+	hook(type => "htmlescape", id => "ext", call => \&htmlescape);
+
+Some markup languages do not allow raw html to be mixed in with the markup
+language, and need it to be escaped in some way. This hook is a companion
+to the htmlize hook, and is called when ikiwiki detects that a preprocessor
+directive is inserting raw html. It is passed the chunk of html in
+question, and should return the escaped chunk.
+
+### htmlescapelink
+
+	hook(type => "htmlescapelink", id => "ext", call => \&htmlescapelink);
+
+Some markup languages have special syntax to link to other pages. This hook
+is a companion to the htmlize and htmlescape hooks, and it is called when a
+link is inserted. It is passed the target of the link and the text of the 
+link, and an optional named parameter "broken" if a broken link is being
+generated. It should return the correctly-formatted link.
+
 ### pagetemplate
 
 	hook(type => "pagetemplate", id => "foo", call => \&pagetemplate);
@@ -355,6 +375,7 @@
 * forcesubpage  - set to force a link to a subpage
 * linktext - set to force the link text to something
 * anchor - set to make the link include an anchor
+* genhtml - set to generate HTML and not escape for correct format
 
 #### `readfile($;$)`
 
Index: doc/plugins/rst.mdwn
===================================================================
--- doc/plugins/rst.mdwn	(revision 3197)
+++ doc/plugins/rst.mdwn	(working copy)
@@ -10,10 +10,8 @@
 Note that this plugin does not interoperate very well with the rest of
 ikiwiki. Limitations include:
 
-* reStructuredText does not allow raw html to be inserted into
-  documents, but ikiwiki does so in many cases, including
-  [[WikiLinks|WikiLink]] and many
-  [[PreprocessorDirectives|PreprocessorDirective]].
+* Some bits of ikiwiki may still assume that markdown is used or embed html
+  in ways that break reStructuredText. (Report bugs if you find any.)
 * It's slow; it forks a copy of python for each page. While there is a
   perl version of the reStructuredText processor, it is not being kept in
   sync with the standard version, so is not used.
Index: IkiWiki.pm
===================================================================
--- IkiWiki.pm	(revision 3197)
+++ IkiWiki.pm	(working copy)
@@ -469,6 +469,10 @@
 	my $page=shift; # the page that will contain the link (different for inline)
 	my $link=shift;
 	my %opts=@_;
+	# we are processing $lpage and so we need to format things in accordance
+	# with the formatting language of $lpage. inline generates HTML so links
+	# will be escaped seperately.
+	my $type=pagetype($pagesources{$lpage});
 
 	my $bestlink;
 	if (! $opts{forcesubpage}) {
@@ -494,12 +498,17 @@
 	}
 	if (! grep { $_ eq $bestlink } map { @{$_} } values %renderedfiles) {
 		return $linktext unless length $config{cgiurl};
-		return "<span><a href=\"".
-			cgiurl(
-				do => "create",
-				page => pagetitle(lc($link), 1),
-				from => $lpage
-			).
+		my $url = cgiurl(
+				 do => "create",
+				 page => pagetitle(lc($link), 1),
+				 from => $lpage
+				);
+
+		if ($hooks{htmlescapelink}{$type} && ! $opts{genhtml}){
+			return $hooks{htmlescapelink}{$type}{call}->($url, $linktext,
+							       broken => 1);
+		}
+		return "<span><a href=\"". $url.
 			"\">?</a>$linktext</span>"
 	}
 	
@@ -514,6 +523,9 @@
 		$bestlink.="#".$opts{anchor};
 	}
 
+	if ($hooks{htmlescapelink}{$type} && !$opts{genhtml}) {
+	  return $hooks{htmlescapelink}{$type}{call}->($bestlink, $linktext);
+	}
 	return "<a href=\"$bestlink\">$linktext</a>";
 } #}}}
 
@@ -628,6 +640,14 @@
 				preview => $preprocess_preview,
 			);
 			$preprocessing{$page}--;
+
+			# Handle escaping html if the htmlizer needs it.
+			if ($ret =~ /[<>]/ && $pagesources{$page}) {
+				my $type=pagetype($pagesources{$page});
+				if ($hooks{htmlescape}{$type}) {
+					return $hooks{htmlescape}{$type}{call}->($ret);
+				}
+			}
 			return $ret;
 		}
 		else {
</pre>