Python Language Common Pitfalls Integer and String identity


Python uses internal caching for a range of integers to reduce unnecessary overhead from their repeated creation.

In effect, this can lead to confusing behavior when comparing integer identities:

>>> -8 is (-7 - 1)
>>> -3 is (-2 - 1)

and, using another example:

>>> (255 + 1) is (255 + 1)
>>> (256 + 1) is (256 + 1)

Wait what?

We can see that the identity operation is yields True for some integers (-3, 256) but no for others (-8, 257).

To be more specific, integers in the range [-5, 256] are internally cached during interpreter startup and are only created once. As such, they are identical and comparing their identities with is yields True; integers outside this range are (usually) created on-the-fly and their identities compare to False.

This is a common pitfall since this is a common range for testing, but often enough, the code fails in the later staging process (or worse - production) with no apparent reason after working perfectly in development.

The solution is to always compare values using the equality (==) operator and not the identity (is) operator.

Python also keeps references to commonly used strings and can result in similarly confusing behavior when comparing identities (i.e. using is) of strings.

>>> 'python' is 'py' + 'thon'

The string 'python' is commonly used, so Python has one object that all references to the string 'python' use.

For uncommon strings, comparing identity fails even when the strings are equal.

>>> 'this is not a common string' is 'this is not' + ' a common string'
>>> 'this is not a common string' == 'this is not' + ' a common string'

So, just like the rule for Integers, always compare string values using the equality (==) operator and not the identity (is) operator.