unicode Diacritics


Example

A letter with a diacritic may be represented with the letter, and a combining modifier letter. You normally think of as one character, but it's really 2 code points:

  • U+0065 — LATIN SMALL LETTER E
  • U+0301 — COMBINING ACUTE ACCENT

Similarly = c + ¸, and = a + ˚

combined forms

To complicate matters, there is often a code point for the composed form as well:

"Café" = 'C' + 'a' + 'f' + 'e' + '´'
"Café" = 'C' + 'a' + 'f' + 'é'

Although these strings look the same, they are not equal, and they don't even have the same length (5 and 4 respectively).