aboutsummaryrefslogtreecommitdiff
path: root/doc/bugs/rss_feeds_do_not_use_recommended_encoding_of_entities_for_some_fields.mdwn
blob: 0a435cea33a4220b3ada48df349662374c407ee4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
The Atom and RSS templates use `ESCAPE=HTML` in the title elements. However, HTML-escaped characters aren't valid according to <http://feedvalidator.org/>.

Removing `ESCAPE=HTML` works fine, but I haven't checked to see if there are any characters it won't work for.

For Atom, at least, I believe adding `type="xhtml"` to the title element will work. I don't think there's an equivalent for RSS.

> Removing the ESCAPE=HTML will not work, feed validator hates that just as
> much. It wants rss feeds to use a specific style of escaping that happens
> to work in some large percentage of all rss consumers. (Most of which are
> broken).
> <http://www.rssboard.org/rss-profile#data-types-characterdata>
> There's also no actual spec about how this should work.
> 
> This will be a total beast to fix. The current design is very clean in
> that all (well, nearly all) xml/html escaping is pushed back to the
> templates. This allows plugins to substitute fields in the templates
> without worrying about getting escaping right in the plugins -- and a
> plugin doesn't even know what kind of template is being filled out when
> it changes a field's value, so it can't do different types of escaping
> for different templates.
>
> The only reasonable approach seems to be extending HTML::Template with an
> ESCAPE=RSS and using that. Unfortunately its design does not allow doing
> so without hacking its code in several places. I've contacted its author
> to see if he'd accept such a patch.
>
> (A secondary bug is that using meta title currently results in unnecessry
> escaping of the title value before it reaches the template. This makes
> the escaping issues show up much more than they need to, since lots more
> characters are currently being double-escaped in the rss.)
> 
> --[[Joey]]

> Update: Ok, I've fixed this for titles, as a special case, but the
> underlying problem remains for other fields in rss feeds (such as
> author), so I'm leaving this bug report open. --[[Joey]]

>> I'm curious if there has been any progress on better RSS output?
>> I've been prototyping a new blog and getting good RSS out of it 
>> seems important as the bulk of my current readers use RSS.
>> I note, in passing that the "more" plugin doesn't quite do what 
>> I want either - I'd like to pass a full RSS feed of a post and only
>> have "more" apply to the front page of the blog. Is there a way to do that?
>> -- [[dtaht]]
>> 
>>> To be clear, the RSS spec sucks to such an extent that, as far as
>>> I know, there is no sort of title escaping that will work in all 
>>> RSS consumers. Titles are currently escaped in the way 
>>> that tends to break the fewest according to what I've read.
>>> If you're unlucky enough to 
>>> have a "&" or "<" in your **name**, then you may still run into 
>>> problems with how that is escaped in rss feeds. --[[Joey]]