Verify file integrity
by Felix Kinaro About 2 min reading time
File modifications in transit can mean compromise to the security of a system. An attacker may embed malware and exploit unsuspecting end users. File integrity verification using hash functions is the best means to determine if a file has been modified.
What are cryptographic hash functions?
This is a mathematical algorithm that maps an arbitrary size input (message) to a fixed length string value (Message digest).
Ideal hash function
- The same input always yields the same hash value. For instance, I have a file Office16.x64.en-US.ISO, which I burn onto a DVD. If I make a bitstream copy of the file using a tool such as dd, it should yield the same hash value as the first time, while using the same hash algorithm.
- Modifying an input just slightly should produce a hash value that appears totally unrelated to the first.
The hash value of a html file before modification:
Contents of the file:
I have edited the file and added a[dot]
at the end of line 18.
This totally changes the hash value generated using the same SHA256 algorithm - It should be impossible to find messages that have the same hash value (Collisions). This is the main reason as to why the MD5 hash function was deprecated.
You can read more on Wikipedia
Integrity Verification on Windows
There is no officially supported method of determining file integrity on Windows systems.
The best option is to install the HashCheck Shell Extension.
The extension adds a Checksums entry in the File Properties window
One can then compare the values with those provided by the source of the information.
Integrity verification on GNU/Linux systems
I'm using Ubuntu 18.04, and there are hash functions available out of the box:
- sha1
- sha224
- sha256
- sha384
- sha512
You can find detailed information on the Secure hash algorithms page on wikipedia.
Other uses of cryptographic hash functions
- File Comparisons
- Public key cryptography, such as in Transport Layer Security and PGP
- Hash tables
- Finding similar substrings
- Data corruption detection