Hash Function :
A hash function maps data of various sizes (called messages) to a bit array of a fixed size. It is practically impossible to make it work backwards.
Given a message m1, hash(m1) (say, p) could be calculated really fast but given p, it should be impossible to get m1 without brute force.
The important use-cases of a hash function are : Verifying integrity of files and messages, password verification, Proof of Work (that was our previous topic of discussion).
For example, while transferring a file from one computer to another, it is important to know it has reached the destination intact, in one piece - Hash is like a signature for an entire file (16 or 32 or 64 characters, generally hexadecimal)
Three major requirements of a Hash File :
1) Speed - Could go through a large file at reasonably fast speed (a second or two at most) - Shouldn’t also be too quick (easy to break)
2) The Avalanche Effect - If you change 1 bit anywhere in the file, the whole hash should be completely different
3) Avoid hash collisions - It should be very difficult to find messages m1 and m2 that have the same hash (MD5 was extensively used as a hash function but was broken in 1996, advised to not use MD5 where collision-resistant hash function is required)
Why speed should not be too quick?
If hash is too slow, no one will want to use it but if it too fast and you can create new ones in a few processor cycles then you can easily create documents that match a particular hash
Why avalanche effect is necessary?
If a cryptographic hash function has a poor randomisation, then a cryptanalyst can make predictions about the input, being given only the output which would further lead the algorithm to be broken
What are the various hash functions used over the years?
CRC32, MD5, SHA-1, SHA-2, SHA-3, RipeMD160 are some of the most commonly used hash functions over the years
Fun Fact : SHA1 was developed by NSA in 1995 (Takes an input and produces a 20-byte (160-bit) hash value. On 23rd Feb 2017 CWI and Google generated 2 different PDF files with the same SHA-1 hash (100,000 times faster than brute forcing). They named this attack SHAttered!
In one of the next posts, we will look at MD5, SHA-1,2 hash functions in detail.
