2009-07-12

Web Applications should always use UTF-8 charset

I have seen that iso-8859-1 encoding in HTTP is broken as some "genius" from Microsoft decided to encode non iso-8859-1 characters using HTML entity encoding without using proper escaping, thus making non iso-8859-1 characters unusable and iso-8859-1 encoding for web applications not useful. FORM submission and i18n

Being Windows-1252 a superset of iso-8859-1 has made every browser and most other programs interpret Windows-1252 when iso-8859-1 is specified. Even the HTML5 standard states that iso-8859-1 should be interpreted as Windows-1252.

Apache Tomcat needs to use UTF-8 instead of iso-8859-1 by default. Here are some tips on working around this bad Tomcat default: Tomcat/UTF-8.

Always use UTF-8 as the encoding of any HTML page and especially of Web Applications.

No comments: