A Low Complexity "Fix" for MD5, MD4 and MD2


 

Abstract


The cryptographic hash function MD5 has become unusable for cryptographic and authentication purposes because hash collisions have been found that open up MD5 hashed data to undetectable alteration. Cryptographic functions (in order to work correctly) are supposed to be completely free of collisions. This "Random Oracle" property is however illusive.


At least 100 MD5 hash collisions have been found via HashClash, a BOINC distributed computing project. Possibly millions of productive MD5 collisions exist in the selective modification of 'dataset bits' that MD5 was meant to protect. Thus, it is expected that the MD5 family will be phased out by 2020.


Yet there are simple fixes.
Neither MD2, MD4 nor MD5's algorithms need to be altered in any way. The only alteration is the way data is run thru the hash and how it is treated afterwords. In essence, this is a bi-directional data processing solution. 

The MD5 hash function (as well the related MD4 and MD2 hash functions) can have their technological usefulness extended by running the hash function both forward and backward over the same data. Taking the resultant hashes and adding (specifically XORing) them to produce a new hash is all that is needed. The newly created hash will be much harder to defeat.

Historical: MD2 and MD5's Known Weaknesses

  1. Rogier and Chauvaud (1997) described collisions of MD2's compression function, although they were unable to extend the attack to the full MD2. In 2004, MD2 was shown to be vulnerable to a preimage attack with time complexity equivalent to 2104 applications of the compression function (Muller, 2004). The author concludes, "MD2 can no longer be considered a secure one-way hash function". The design weaknesses of MD2 led to the creation of MD5.

  2. MD5 makes only one pass over the data, if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more reasonable. And because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix—for any given string of characters X, two colliding files can be determined which both begin with X. All that is required to generate two colliding files is a template file, with a 128-byte block of data aligned on a 64-byte boundary, that can be changed freely by the collision-finding algorithm.


There is no formal definition which captures all of the properties considered desirable for a cryptographic hash function. These properties below are generally considered prerequisites:

A hash function meeting these criteria may still have undesirable properties.

For instance, most popular hash functions are vulnerable to length-extension attacks: given hash(message) and length(message) but not message, by choosing a suitable message' an attacker can calculate hash(message || message'), where || denotes concatenation. This property can be used to break naive authentication schemes based on hash functions. The HMAC construction works around these problems.

An ideal hash function would be maximally "boring": it would have no interesting properties such as length extension, and the only interesting way it would differ from a random function would be in that it was deterministic and efficiently computable.

This criterion is of course deeply resistant to formal expression; the closest thing to formal expression is the random oracle model, which is an idealization no real hash function can satisfy.

Note

  1. Most MD5 hash collisions (when they are found) are not useable for forging electronic documents and data. A great deal of computational effort (with larger data sets, but not smaller data sets) is still involved for those wishing to forge MD5 hashes. 
  2. MD5 is still a vast improvement over using CRC-32 or CRC-40. Mathematical research suggests that MD5 may still be better by many orders of magnitude compared to CRC-64-ECMA, the strongest CRC in common use today. 
  3. A small percentage of all MD5 collisions (in the subset of all collisions) can be used to forge small data sets. However, routine use of forged MD5 data (of larger datasets) is not practical as it is with CRC-32 or some versions of CRC-40.
  4. Most communications, data transfer and authentications protocols are designed (or should be redesigned) to trap MD5 forgeries. 

 

In spite of the relatively good news about the health of MD5, it is set to put into the cryptography technology dustbin.

 

MD5 need not be thrown away just yet. A low complexity fix can keep MD5 in common use for another decade, but in a slightly modified form.

 

 

 

The Fix, Forward + Backward MD5 (MD5BF)

 

 

Key terms & definitions

  1. BOF: Beginning of File (or data stream, data packet, data field etc)
  2. EOF: End of File (or data stream, data packet, data field etc)
  3. MD5(): MD5 in its default run mode (BOF to EOF)
  4. MD5_backward(): MD5 run the opposite direction (EOF to BOF)
  5. [+] : XOR function, bitwise not bytewise.
  6. I will just assume File = ASCII(“1234567890”)
  7. Thus: MD5(File) = e807f1fcf82d132f9bb018ca6738a19f (hex)
  8. A File 'read backward' = ASCII(“0987654321”), but is always indicated as _backward() in the specification syntax used here.
  9. The code of MD5() is not changed at all -- only the bytes are fed to from EOF to BOF.
  10. Thus: MD5_backward(File) = 6fb42da0e32e07b61c9f0251fe627a9c (hex)
  11. I have ignored the issue of Big-Endian and Little-Endian here, as I don’t believe this binary math layer issue affects my technical description at this level

 

Why the backwards option was chosen

Running MD5 backward by itself is not an option. MD5_backward() is mathematically as weak to MD5 in its traditional forward direction.

  1. In some cases it may be advisable that branch versions of this hash function be created that add CRC-32(File) or CRC-64-ECMA(File) to MD5_backward() after the hash function has been run -- in order to be unhelpful to those seeking to find collisions. The exact math is still being worked on here.
  2. Syntactically the CRC addition should be reflected as such: MD5_backward_CRC32() or MD5_backward_CRC64ECMA().
  3. Adding the CRC to the beginning or end of the data stream is also an option, but probably riskier for some uses of MD5 that have evolved over time.

What is not fixed
Rainbow Table Immunity: there is no real increased immunity for MD5BF with respect to Rainbow Table immunity. So long as MD5 is asked to create a hash for a number of bits smaller than the number of bits that MD5 creates as output -- the Rainbow Table weakness will continue to be a problem.

However, given that data structures are complex enough and have other checksums [as well as hashsums] and assuming these are also monitored for consistency -- then there is no foreseeable reason for MD5BF to a perfectly workable solution until a better method is found.

What is fixed
MD5BF is much more immune to the append_word_to_hash() --> find_hash_collision() weakness in MD5. MD5BF may indeed push the process of appending to a bytestrem to an existing bytestream to find a hash collision into near computational impossibility (with current equipment).


Rainbow Table (or similar plaintext attacks) immunity for long bytestreams: MD5BF should be considered immune to these kinds of attacks up to some limit.

Counterfeiting MD5BF signatures should be nearly impossible under normal circumstances.

 

The New MD5-PSB & MD5-PMD2

 

 

MD5-PSB (Plus 'hash' Sum Backward)

 

Demonstration of MD5-PSB

MD5(File) =

e807f1fcf82d132f9bb018ca6738a19f

hex

[+]

 

 xor

MD5_backward(File) =

6fb42da0e32e07b61c9f0251fe627a9c

hex

Sum =

 

 


MD5-PMD2 (Plus MD2)


Demonstration of MD5-PMD2

MD5(File) =

e807f1fcf82d132f9bb018ca6738a19f

hex

[+]

 

 xor

MD2(File) =

116f4212fae29cf4a77e1be4f53dc551

hex

Sum =

 

 

 


MD5-FB-SHA-Interop (SHA-0 and SHA-1 Bit Field Length Option)

 

 

References

General Topic

Specific Hash Functions

Checksums (greater than 32 bits) behave like Hash Functions



Created By

Max Power


Original Idea
20 SEP 2006

Date Created

16 OCT 2006


Last Revised

30 MAR 2013