Ruby Language Strings Understanding the data in a string


In Ruby, a string is just a sequence of bytes along with the name of an encoding (such as UTF-8, US-ASCII, ASCII-8BIT) that specifies how you might interpret those bytes as characters.

Ruby strings can be used to hold text (basically a sequence of characters), in which case the UTF-8 encoding is usually used.

"abc".bytes  # => [97, 98, 99]
"abc"  # => "UTF-8"

Ruby strings can also be used to hold binary data (a sequence of bytes), in which case the ASCII-8BIT encoding is usually used.

[42].pack("i").encoding  # => "ASCII-8BIT"

It is possible for the sequence of bytes in a string to not match the encoding, resulting in errors if you try to use the string.

"\xFF \xFF".valid_encoding? # => false
"\xFF \xFF".split(' ')      # ArgumentError: invalid byte sequence in UTF-8