import heapq # get 5 largest items from the range heapq.nlargest(5, range(10)) # Output: [9, 8, 7, 6, 5] heapq.nsmallest(5, range(10)) # Output: [0, 1, 2, 3, 4]
This is much more efficient than sorting the whole iterable and then slicing from the end or beginning. Internally these functions use the binary heap priority queue data structure, which is very efficient for this use case.
sorted, these functions accept the optional
key keyword argument, which must be a function that, given an element, returns its sort key.
Here is a program that extracts 1000 longest lines from a file:
import heapq with open(filename) as f: longest_lines = heapq.nlargest(1000, f, key=len)
Here we open the file, and pass the file handle
nlargest. Iterating the file yields each line of the file as a separate string;
nlargest then passes each element (or line) is passed to the function
len to determine its sort key.
len, given a string, returns the length of the line in characters.
This only needs storage for a list of 1000 largest lines so far, which can be contrasted with
longest_lines = sorted(f, key=len)[1000:]
which will have to hold the entire file in memory.