Ever saved a record into a MySQL database and upon retrieval of the record, you realized that the non-Latin characters(e.g. Asian characters) don't quite turn up as expected? Specifically, they turn up as a bunch of "?".

The likelihood of why this is occurring is that the MySQL database is configured as a default for Latin characters. In this day and age, one would assume that UTF-8 would be a de facto standard for this. Given my limited knowledge on database administration, I cannot fathom why Latin is chosen as a default over UTF-8. Shouldn't the benefit and the flexibility of supporting non-Latin characters outweigh whatever space that a Latin-only database saves (I am guessing this is the benefit) ?

Oh well, what would I know.

If UTF-8 is a foreign term, you absolutely need to read why it should even be a concern to any developer: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

For those, who see the need to support UTF-8 in your MySQL databases, fortunately you can do so by editing the MySQL configuration file (my.cnf). This file is typically found at: /etc/mysql/my.cnf if you are on Mac OS X or any UNIX/Linux-based OS.

Add the following to the my.cnf:

[client]
default-character-set = utf8 
...
... 

[mysqld]
default-character-set = utf8 
character-set-server  = utf8
default-collation     = utf8_general_ci
...

As an alternative, you can specify the default collation to be using "utf8_unicode_ci". The difference between them is :

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

For more information on configuration of character-set: