Python Language Encoding/decoding error handling


Example

.encode and .decode both have error modes.

The default is 'strict', which raises exceptions on error. Other modes are more forgiving.

Encoding

>>> "£13.55".encode('ascii', errors='replace')
b'?13.55'
>>> "£13.55".encode('ascii', errors='ignore')
b'13.55'
>>> "£13.55".encode('ascii', errors='namereplace')
b'\\N{POUND SIGN}13.55'
>>> "£13.55".encode('ascii', errors='xmlcharrefreplace')
b'£13.55'
>>> "£13.55".encode('ascii', errors='backslashreplace')
b'\\xa313.55'

Decoding

>>> b = "£13.55".encode('utf8')
>>> b.decode('ascii', errors='replace')
'��13.55'
>>> b.decode('ascii', errors='ignore')
'13.55'
>>> b.decode('ascii', errors='backslashreplace')
'\\xc2\\xa313.55'

Morale

It is clear from the above that it is vital to keep your encodings straight when dealing with unicode and bytes.