unicode Tutorial => Characters can consist of multiple code points

Remarks

An Unicode code point, what programmers often think of one character, often corresponds to what the user thinks is one character. Sometimes however a “character” is made up of multiple code points, as the examples above show.

This means that operations like slicing a string, or getting a character at a given index may not work as expected. For instance the 4^th character of the string "Café" is 'e' (without the accent). Similarly, clipping the string to length 4 will remove the accent.

The technical term for such a group of code points is a grapheme cluster. See UAX #29: Unicode Text Segmentation

Diacritics
Emoji and flags
Zalgo Text

PDF - Download unicode for free

Previous Next

unicode

Fastest Entity Framework Extensions

Remarks

Got any unicode Question?

unicode

unicode Characters can consist of multiple code points

Fastest Entity Framework Extensions

Remarks

Characters can consist of multiple code points Related Examples

Got any unicode Question?