IPython (and the Jupyter Notebook) defines a special meaning for the underscore. It always contains the the most recent output. This comes in useful when processing data in multiple steps. This is demonstrated in the example above. To clean up the text, it is run through a few regular expressions and then normalized before being split. Outside of IPython, each result would have to be stored in a new variable or the steps nested. In IPython, often we are exploring and keeping track of variables or reproducing a long series of nested calls is tedious. So this is where the underscore appears.
There is one gotcha. If you assign a value to the underscore in the global scope, it causes unexpected behavior. For example:
address = ('http://example.com', 80)
(_, port) = address
Here I am only interested in the second element in the tuple, the port number. So I follow convention and assign the first element to the underscore to indicate it is a throwaway. However, now the value of the underscore is http://example.com
. And if I were to run more code:
1+4
The expected value of the underscore would be 5. However, it is not. The value is still the domain from the tuple. When you assign to the underscore in the global scope, it not only clobbers the value, but it also stops storing the most recent output. This is not the case if you assign to the underscore inside a function or loop.