aboutsummaryrefslogtreecommitdiff
path: root/doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn
blob: 21df334a8e1d1e662dfcd84837f07f6356e19be0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Wide characters should probably be supported, or, at the very least, warned about.

Test case:

    mkdir -p ikiwiki-utf-test/raw ikiwiki-utf-test/rendered
    for page in txt mdwn; do
      echo hello > ikiwiki-utf-test/raw/$page.$page
      for text in 8 16 16BE 16LE 32 32BE 32LE; do
        iconv -t UTF$text ikiwiki-utf-test/raw/$page.$page > ikiwiki-utf-test/raw/$page-utf$text.$page;
      done
    done
    ikiwiki --verbose --plugin txt --plugin mdwn ikiwiki-utf-test/raw/ ikiwiki-utf-test/rendered/
    www-browser ikiwiki-utf-test/rendered/ || x-www-browser ikiwiki-utf-test/rendered/
    # rm -r ikiwiki-utf-test/ # some browsers rather stupidly daemonize themselves, so this operation can't easily be safely automated

BOMless LE and BE input is probably a lost cause.

Optimally, UTF-16 (which is ubiquitous in the Windows world) and UTF-32 should be fully supported, probably by converting to mostly-UTF-8 and using `&#xXXXX;` or `&#DDDDD;` XML escapes where necessary.

Suboptimally, UTF-16 and UTF-32 should be converted to UTF-8 where cleanly possible and a warning printed where impossible.