Python Language Tutorial => Basics

Example

In Python 3 str is the type for unicode-enabled strings, while bytes is the type for sequences of raw bytes.

type("f") == type(u"f")  # True, <class 'str'>
type(b"f")               # <class 'bytes'>

In Python 2 a casual string was a sequence of raw bytes by default and the unicode string was every string with "u" prefix.

type("f") == type(b"f")  # True, <type 'str'>
type(u"f")               # <type 'unicode'>

Unicode to bytes

Unicode strings can be converted to bytes with .encode(encoding).

Python 3

>>> "£13.55".encode('utf8')
b'\xc2\xa313.55'
>>> "£13.55".encode('utf16')
b'\xff\xfe\xa3\x001\x003\x00.\x005\x005\x00'

Python 2

in py2 the default console encoding is sys.getdefaultencoding() == 'ascii' and not utf-8 as in py3, therefore printing it as in the previous example is not directly possible.

>>> print type(u"£13.55".encode('utf8'))
<type 'str'>
>>> print u"£13.55".encode('utf8')
SyntaxError: Non-ASCII character '\xc2' in...

# with encoding set inside a file

# -*- coding: utf-8 -*-
>>> print u"£13.55".encode('utf8')
┬ú13.55

If the encoding can't handle the string, a `UnicodeEncodeError` is raised:

>>> "£13.55".encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)

Bytes to unicode

Bytes can be converted to unicode strings with .decode(encoding).

A sequence of bytes can only be converted into a unicode string via the appropriate encoding!

>>> b'\xc2\xa313.55'.decode('utf8')
'£13.55'

If the encoding can't handle the string, a UnicodeDecodeError is raised:

>>> b'\xc2\xa313.55'.decode('utf16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/csaftoiu/csaftoiu-github/yahoo-groups-backup/.virtualenv/bin/../lib/python3.5/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x35 in position 6: truncated data

PDF - Download Python Language for free

Previous Next

Python Language

Fastest Entity Framework Extensions

Example

Unicode to bytes

Bytes to unicode

Got any Python Language Question?

Python Language

Python Language Unicode and bytes Basics

Fastest Entity Framework Extensions

Example

Unicode to bytes

Bytes to unicode

Got any Python Language Question?