Stats

Contributors: 3
2016-08-09
Licensed under: CC-BY-SA

Not affiliated with Stack Overflow
Rip Tutorial: info@zzzprojects.com

Download eBook

String Encoding and Decomposition

Download swift eBook

Example

A Swift String is made of Unicode code points. It can be decomposed and encoded in several different ways.

let str = "ที่👌①!"

Decomposing Strings

A string's characters are Unicode extended grapheme clusters:

Array(str.characters)  // ["ที่", "👌", "①", "!"]

The unicodeScalars are the Unicode code points that make up a string (notice that ที่ is one grapheme cluster, but 3 code points — 3607, 3637, 3656 — so the length of the resulting array is not the same as with characters):

str.unicodeScalars.map{ $0.value }  // [3607, 3637, 3656, 128076, 9312, 33]

You can encode and decompose strings as UTF-8 (a sequence of UInt8s) or UTF-16 (a sequence of UInt16s):

Array(str.utf8)   // [224, 184, 151, 224, 184, 181, 224, 185, 136, 240, 159, 145, 140, 226, 145, 160, 33]
Array(str.utf16)  // [3607, 3637, 3656, 55357, 56396, 9312, 33]

String Length and Iteration

A string's characters, unicodeScalars, utf8, and utf16 are all Collections, so you can get their count and iterate over them:

// NOTE: These operations are NOT necessarily fast/cheap! 

str.characters.count     // 4
str.unicodeScalars.count // 6
str.utf8.count           // 17
str.utf16.count          // 7
for c in str.characters { // ...
for u in str.unicodeScalars { // ...
for byte in str.utf8 { // ...
for byte in str.utf16 { // ...