.encode
and .decode
both have error modes.
The default is 'strict'
, which raises exceptions on error. Other modes are more forgiving.
>>> "£13.55".encode('ascii', errors='replace')
b'?13.55'
>>> "£13.55".encode('ascii', errors='ignore')
b'13.55'
>>> "£13.55".encode('ascii', errors='namereplace')
b'\\N{POUND SIGN}13.55'
>>> "£13.55".encode('ascii', errors='xmlcharrefreplace')
b'£13.55'
>>> "£13.55".encode('ascii', errors='backslashreplace')
b'\\xa313.55'
>>> b = "£13.55".encode('utf8')
>>> b.decode('ascii', errors='replace')
'��13.55'
>>> b.decode('ascii', errors='ignore')
'13.55'
>>> b.decode('ascii', errors='backslashreplace')
'\\xc2\\xa313.55'
It is clear from the above that it is vital to keep your encodings straight when dealing with unicode and bytes.