unicode Characters can consist of multiple code points

30% OFF - 9th Anniversary discount on Entity Framework Extensions until December 15 with code: ZZZANNIVERSARY9

Remarks

An Unicode code point, what programmers often think of one character, often corresponds to what the user thinks is one character. Sometimes however a “character” is made up of multiple code points, as the examples above show.

This means that operations like slicing a string, or getting a character at a given index may not work as expected. For instance the 4th character of the string "Café" is 'e' (without the accent). Similarly, clipping the string to length 4 will remove the accent.

The technical term for such a group of code points is a grapheme cluster. See UAX #29: Unicode Text Segmentation



Got any unicode Question?