A hash is a function that converts a variable length sequence of bytes to a fixed length sequence. Hashing files can be advantageous for many reasons. Hashes can be used to check if two files are identical or verify that the contents of a file haven't been corrupted or changed.
You can use hashlib
to generate a hash for a file:
import hashlib
hasher = hashlib.new('sha256')
with open('myfile', 'r') as f:
contents = f.read()
hasher.update(contents)
print hasher.hexdigest()
For larger files, a buffer of fixed length can be used:
import hashlib
SIZE = 65536
hasher = hashlib.new('sha256')
with open('myfile', 'r') as f:
buffer = f.read(SIZE)
while len(buffer) > 0:
hasher.update(buffer)
buffer = f.read(SIZE)
print(hasher.hexdigest())