.NET strings contain System.Char
(UTF-16 code-units). If you want to save (or manage) text with another encoding you have to work with an array of System.Byte
.
Conversions are performed by classes derived from System.Text.Encoder
and System.Text.Decoder
which, together, can convert to/from another encoding (from a byte X encoded array byte[]
to an UTF-16 encoded System.String
and vice-versa).
Because the encoder/decoder usually works very close to each other they're grouped together in a class derived from System.Text.Encoding
, derived classes offer conversions to/from popular encodings (UTF-8, UTF-16 and so on).
byte[] data = Encoding.UTF8.GetBytes("This is my text");
var text = Encoding.UTF8.GetString(data);
This code will read content of an UTF-8 encoded text file and save it back encoded as UTF-16. Note that this code is not optimal if file is big because it will read all its content into memory:
var content = File.ReadAllText(path, Encoding.UTF8);
File.WriteAllText(content, Encoding.UTF16);