There are dozens of character sets with hundreds of collations. (A given collation belongs to only one character set.) See the output of SHOW COLLATION;
.
There are usually only 4 CHARACTER SETs
that matter:
ascii -- basic 7-bit codes.
latin1 -- ascii, plus most characters needed for Western European languages.
utf8 -- the 1-, 2-, and 3-byte subset of utf8. This excludes Emoji and some of Chinese.
utf8mb4 -- the full set of UTF8 characters, covering all current languages.
All include English characters, encoded identically. utf8 is a subset of utf8mb4.
Best practice...
TEXT
or VARCHAR
column that can have a variety of languages in it.utf8mb4 did not exist until version 5.5.3, so utf8 was the best available before that.
Outside of MySQL, "UTF8" means the same things as MySQL's utf8mb4, not MySQL's utf8.
Collations start with the charset name and usually end with _ci
for "case and accent insensitive" or _bin
for "simply compare the bits.
The 'latest' utf8mb4 collation is utf8mb4_unicode_520_ci
, based on Unicode 5.20. If you are working with a single language, you might want, say, utf8mb4_polish_ci
, which will rearrange the letters slightly, based on Polish conventions.