diff options
author | https://www.google.com/accounts/o8/id?id=AItOawmKvEhLuOASwaGDP803twCdJYPyXe6JQHY <Martin@web> | 2011-11-04 06:31:35 -0400 |
---|---|---|
committer | admin <admin@branchable.com> | 2011-11-04 06:31:35 -0400 |
commit | f2d322efee7f098eac9b38a0844995255a008df2 (patch) | |
tree | f0944efe968c3db35fa2f94ee9ac6281813a2467 /doc/bugs | |
parent | f6d66d46845534da1b7677922193f24363f44003 (diff) | |
download | ikiwiki-f2d322efee7f098eac9b38a0844995255a008df2.tar ikiwiki-f2d322efee7f098eac9b38a0844995255a008df2.tar.gz |
add comment.
Diffstat (limited to 'doc/bugs')
-rw-r--r-- | doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn b/doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn index 21df334a8..9e8fba4b9 100644 --- a/doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn +++ b/doc/bugs/UTF-16_and_UTF-32_are_unhandled.mdwn @@ -18,3 +18,12 @@ BOMless LE and BE input is probably a lost cause. Optimally, UTF-16 (which is ubiquitous in the Windows world) and UTF-32 should be fully supported, probably by converting to mostly-UTF-8 and using `&#xXXXX;` or `&#DDDDD;` XML escapes where necessary. Suboptimally, UTF-16 and UTF-32 should be converted to UTF-8 where cleanly possible and a warning printed where impossible. + +---- +Reading the wikipedia pages about [[!wikipedia UTF-8]] and [[!wikipedia UTF-16]], all valid Unicode characters are representable in UTF-8, UTF-16 and UTF-32, and the only errors possible with UTF-16/32 -> UTF-8 translation are when there are encoding errors in the original document. + +Of course, it's entirely possible that not all browsers support utf-8 correctly, and we might need to support the option of encoding into [[!wikipedia CESU-8]] instead, which has the side-effect of allowing the transcription of UTF-16 or UTF-32 encoding errors into the output byte-stream, rather than pedantically removing those bytes. + +An interesting question would be how to determine the character set of an arbitrary new file added to the repository, unless the repository itself handles character-encoding, in which case, we can just ask the repository to hand us a UTF-8 encoded version of the file. + +-- [[Martin Rudat|http://www.toraboka.com/~mrudat]] |