aboutsummaryrefslogtreecommitdiff
path: root/doc/bugs/Aggregated_Atom_feeds_are_double-encoded.mdwn
blob: fbdc58d5d3c8b57a73821c546ba38f96f4a3b895 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
The Atom feed from <http://planet.collabora.co.uk/>
get "double-encoded" (UTF-8 is decoded as Latin-1 and re-encoded as
UTF-8) when aggregated with IkiWiki on Debian unstable. The RSS 1.0
and RSS 2.0 feeds from the same Planet are fine. All three files
are in fact correct UTF-8, but IkiWiki mis-parses the Atom.

This turns out to be a bug in XML::Feed, or (depending on your point
of view) XML::Feed failing to work around a design flaw in XML::Atom.
When parsing RSS it returns Unicode strings, but when parsing Atom
it delegates to XML::Atom's behaviour, which by default is to strip
the UTF8 flag from strings that it outputs; as a result, they're
interpreted by IkiWiki as byte sequences corresponding to the UTF-8
encoding. IkiWiki then treats these as if they were Latin-1 and
encodes them into UTF-8 for output.

I've filed a bug against XML::Feed on CPAN requesting that it sets
the right magical variable to change this behaviour. IkiWiki can
also apply the same workaround (and doing so should be harmless even
when XML::Feed is fixed); please consider merging my 'atom' branch,
which does so. --[[smcv]]

[[!tag patch done]]