Now, Unicode is the future since everyone wants an easy solution to integrate all the characters of all the languages of the world to be supported by every application.
It seems like lots of programming languages have problems handling Unicode strings, mainly because they put Strings and byte buffers on the same bucket.
Maybe, some of the programming languages that have problems with Unicode handling is because programmers are using the API incorrectly, but some of them have a real design flaw that make working correctly with Unicode strings impossible.
My test to know if a programming language has correct Unicode support is just uppercasing the "á" string and verify the "Á" string is returned.
Languages that have a correct API include:
- C with glib (using UTF-8) :
- Java I do not know any way to do a better API. Java totally differentiates a string from a byte buffer.
- C# has correct Unicode support:
- Python3 does Unicode handling just like Java did from 1.0 released at 1995. Python is finally catching up with Java! Take a look at What’s New in Python 3.0 so you can know what was fixed.
- Python2 Why Python2 Unicode Sucks. Python2 Unicode support just calls for problems like urlparse considered harmful as there is no difference between a byte buffer and a string. The slides Unicode In Python2, Completely Demystified will help you understand Unicode in Python2. Move on to Python3 and you will be safe. Python2 has correct Unicode support but it is hard to use:
print u'á'.upper()-> Á
- Perl Unicode-processing issues in Perl and how to cope with it and Perl Unicode FAQ. Perl has hard to use Unicode support:
perl -e 'use utf8; print uc("á\n");'-> Á