diff options
Diffstat (limited to 'doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn')
-rw-r--r-- | doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn | 47 |
1 files changed, 47 insertions, 0 deletions
diff --git a/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn new file mode 100644 index 000000000..eb3450a7e --- /dev/null +++ b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn @@ -0,0 +1,47 @@ +I'm experimenting with using Ikiwiki as a feed aggregator. + +The Planet Ubuntu RSS 2.0 feed (<http://planet.ubuntu.com/rss20.xml>) as of today +has someone whose name contains the character u-with-umlaut. In HTML 4.0, this is +specified as the character entity uuml. Ikiwiki 2.47 running on Debian etch does +not seem to understand that entity, and decides not to un-escape any markup in +the feed. This makes the feed hard to read. + +The following is the test input: + + <rss version="2.0"> + <channel> + <title>testfeed</title> + <link>http://example.com/</link> + <language>en</language> + <description>example</description> + <item> + <title>ü</title> + <guid>http://example.com</guid> + <link>http://example.com</link> + <description>foo</description> + <pubDate>Tue, 27 May 2008 22:42:42 +0000</pubDate> + </item> + </channel> + </rss> + +When I feed this to ikiwiki, it complains: +"processed ok at 2008-05-29 09:44:14 (invalid UTF-8 stripped from feed) (feed entities escaped" + +Note also that the test input contains only pure ASCII, no UTF-8 at all. + +If I remove the ampersand in the title, ikiwiki has no problem. However, the entity is +valid HTML, so it would be good for ikiwiki to understand it. At the minimum, stripping +the offending entity but un-escaping the rest seems like a reasonable thing to do, +unless that has security implications. + +> I tested on unstable, and ikiwiki handled that sample rss fine, +> generating a `ΓΌ.html`. --[[Joey]] + +>> I confirm that it works with ikiwiki 2.50, at least partially. The HTML output is +>> OK, but the aggregate plugin still reports this: +>> +>> processed ok at 2008-07-01 21:24:29 (invalid UTF-8 stripped from feed) (feed entities escaped) +>> +>> I hope that's just a minor blemish. --liw + +>>> Sounds like this is [[done]] --[[Joey]] |