A Low Complexity "Fix" for MD5, MD4 and MD2
Abstract
The cryptographic hash function MD5 has become unusable for
cryptographic and authentication purposes because hash collisions have
been found that open up MD5 hashed data to undetectable alteration.
Cryptographic functions (in order to work correctly) are supposed to
be completely free of collisions. This "Random Oracle" property is
however illusive.
At least 100 MD5 hash collisions have been found
via HashClash, a BOINC distributed
computing project. Possibly millions of productive MD5 collisions
exist in the selective modification of 'dataset bits' that MD5 was meant to protect. Thus, it is
expected that the MD5 family will be phased out by 2020.
Yet there are simple fixes. Neither MD2, MD4 nor MD5's algorithms need
to be altered in any way. The only alteration is the way data
is run thru the hash and how it is treated afterwords. In essence,
this is a bi-directional data processing solution.
The MD5 hash function (as well the related MD4 and MD2 hash functions) can have their technological usefulness extended by running the hash function both forward and backward over the same data. Taking the resultant hashes and adding (specifically XORing) them to produce a new hash is all that is needed. The newly created hash will be much harder to defeat.
Historical: MD2 and MD5's Known Weaknesses
Rogier and Chauvaud (1997) described collisions of MD2's compression function, although they were unable to extend the attack to the full MD2. In 2004, MD2 was shown to be vulnerable to a preimage attack with time complexity equivalent to 2104 applications of the compression function (Muller, 2004). The author concludes, "MD2 can no longer be considered a secure one-way hash function". The design weaknesses of MD2 led to the creation of MD5.
MD5 makes only one pass over the data, if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more reasonable. And because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix—for any given string of characters X, two colliding files can be determined which both begin with X. All that is required to generate two colliding files is a template file, with a 128-byte block of data aligned on a 64-byte boundary, that can be changed freely by the collision-finding algorithm.
There is no formal definition which captures all of the properties
considered desirable for a cryptographic hash function. These properties
below are generally considered prerequisites:
A hash function meeting these criteria may still have undesirable properties.
For instance, most popular hash functions are vulnerable to length-extension attacks: given hash(message) and length(message) but not message, by choosing a suitable message' an attacker can calculate hash(message || message'), where || denotes concatenation. This property can be used to break naive authentication schemes based on hash functions. The HMAC construction works around these problems.
An ideal hash function would be maximally "boring": it would have no interesting properties such as length extension, and the only interesting way it would differ from a random function would be in that it was deterministic and efficiently computable.
This criterion is of course deeply resistant to formal expression; the closest thing to formal expression is the random oracle model, which is an idealization no real hash function can satisfy.
Note
In spite of the relatively good news about the health of MD5, it is set to put into the cryptography technology dustbin.
MD5 need not be thrown away just yet. A low complexity fix can keep MD5 in common use for another decade, but in a slightly modified form.
The Fix, Forward + Backward MD5 (MD5BF)
Key terms & definitions
Why the backwards option was chosen
Running MD5 backward by itself is not an option. MD5_backward() is mathematically as weak to MD5 in its traditional forward direction.
Rainbow Table (or similar plaintext attacks) immunity for long
bytestreams: MD5BF should be considered immune to these kinds of attacks
up to some limit.
Counterfeiting MD5BF signatures should be nearly impossible under normal circumstances.
The New MD5-PSB & MD5-PMD2
MD5-PSB (Plus 'hash' Sum Backward)
Demonstration of MD5-PSB
|
MD5(File) = |
e807f1fcf82d132f9bb018ca6738a19f |
hex |
|
[+] |
|
|
|
MD5_backward(File) = |
6fb42da0e32e07b61c9f0251fe627a9c |
hex |
|
Sum = |
|
|
MD5-PMD2 (Plus MD2)
Demonstration of MD5-PMD2
|
MD5(File) = |
e807f1fcf82d132f9bb018ca6738a19f |
hex |
|
[+] |
|
|
|
MD2(File) = |
116f4212fae29cf4a77e1be4f53dc551 |
hex |
|
Sum = |
|
|
MD5-FB-SHA-Interop (SHA-0 and SHA-1 Bit Field Length Option)
References
General Topic
Specific Hash Functions
Checksums (greater than 32 bits) behave like Hash Functions
|
Created By Max Power |
Original Idea 20 SEP 2006 |
Date Created
|
Last Revised 30 MAR 2013 |