In Ruby, a string is just a sequence of bytes along with the name of an encoding (such as UTF-8
, US-ASCII
, ASCII-8BIT
) that specifies how you might interpret those bytes as characters.
Ruby strings can be used to hold text (basically a sequence of characters), in which case the UTF-8 encoding is usually used.
"abc".bytes # => [97, 98, 99]
"abc".encoding.name # => "UTF-8"
Ruby strings can also be used to hold binary data (a sequence of bytes), in which case the ASCII-8BIT encoding is usually used.
[42].pack("i").encoding # => "ASCII-8BIT"
It is possible for the sequence of bytes in a string to not match the encoding, resulting in errors if you try to use the string.
"\xFF \xFF".valid_encoding? # => false
"\xFF \xFF".split(' ') # ArgumentError: invalid byte sequence in UTF-8