Generator expressions are very similar to list comprehensions. The main difference is that it does not create a full set of results at once; it creates a generator object which can then be iterated over.
For instance, see the difference in the following code:
# list comprehension
[x**2 for x in range(10)]
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# generator comprehension
(x**2 for x in xrange(10))
# Output: <generator object <genexpr> at 0x11b4b7c80>
These are two very different objects:
the list comprehension returns a list
object whereas the generator comprehension returns a generator
.
generator
objects cannot be indexed and makes use of the next
function to get items in order.
Note: We use xrange
since it too creates a generator object. If we would use range, a list would be created. Also, xrange
exists only in later version of python 2. In python 3, range
just returns a generator. For more information, see the Differences between range and xrange functions example.
g = (x**2 for x in xrange(10))
print(g[0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object has no attribute '__getitem__'
g.next() # 0
g.next() # 1
g.next() # 4
...
g.next() # 81
g.next() # Throws StopIteration Exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
NOTE: The function
g.next()
should be substituted bynext(g)
andxrange
withrange
sinceIterator.next()
andxrange()
do not exist in Python 3.
Although both of these can be iterated in a similar way:
for i in [x**2 for x in range(10)]:
print(i)
"""
Out:
0
1
4
...
81
"""
for i in (x**2 for x in xrange(10)):
print(i)
"""
Out:
0
1
4
.
.
.
81
"""
Generator expressions are lazily evaluated, which means that they generate and return each value only when the generator is iterated. This is often useful when iterating through large datasets, avoiding the need to create a duplicate of the dataset in memory:
for square in (x**2 for x in range(1000000)):
#do something
Another common use case is to avoid iterating over an entire iterable if doing so is not necessary. In this example, an item is retrieved from a remote API with each iteration of get_objects()
. Thousands of objects may exist, must be retrieved one-by-one, and we only need to know if an object matching a pattern exists. By using a generator expression, when we encounter an object matching the pattern.
def get_objects():
"""Gets objects from an API one by one"""
while True:
yield get_next_item()
def object_matches_pattern(obj):
# perform potentially complex calculation
return matches_pattern
def right_item_exists():
items = (object_matched_pattern(each) for each in get_objects())
for item in items:
if item.is_the_right_one:
return True
return False