A Low Complexity "Fix" for MD5


 

Summary

The cryptographic hash function MD5 has become unusable for cryptographic and authentication purposes because hash collisions have been found that open up MD5 hashed data to undetectable alteration. Cryptographic functions (in order to work correctly) are supposed to be completely free of collisions. At least 100 MD5 hash collisions have been found via HashClash, a BOINC distributed computing project. Possibly millions of productive MD5 collisions exist in the selective modification of 'dataset bits' that MD5 was meant to protect. Thus, it is expected that MD5 will be phased out in the next 5 years. MD5 can have its life extended by running the function backward and taking the resultant hash and adding it to the hash of normally run MD5. MD5's algorithm need not be altered.



Historical: MD2 and MD5's Known Weaknesses


  1. Rogier and Chauvaud (1997) described collisions of MD2's compression function, although they were unable to extend the attack to the full MD2. In 2004, MD2 was shown to be vulnerable to a preimage attack with time complexity equivalent to 2104 applications of the compression function (Muller, 2004). The author concludes, "MD2 can no longer be considered a secure one-way hash function". The design weaknesses of MD2 led to the creation of MD5.

  2. MD5 makes only one pass over the data, if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more reasonable. And because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix—for any given string of characters X, two colliding files can be determined which both begin with X. All that is required to generate two colliding files is a template file, with a 128-byte block of data aligned on a 64-byte boundary, that can be changed freely by the collision-finding algorithm.


There is no formal definition which captures all of the properties considered desirable for a cryptographic hash function. These properties below are generally considered prerequisites:

A hash function meeting these criteria may still have undesirable properties. For instance, most popular hash functions are vulnerable to length-extension attacks: given hash(message) and length(message) but not message, by choosing a suitable message' an attacker can calculate hash(message || message'), where || denotes concatenation. This property can be used to break naive authentication schemes based on hash functions. The HMAC construction works around these problems.

An ideal hash function would be maximally "boring": it would have no interesting properties such as length extension, and the only interesting way it would differ from a random function would be in that it was deterministic and efficiently computable. This criterion is of course deeply resistant to formal expression; the closest thing to formal expression is the random oracle model, which is an idealization no real hash function can satisfy.

Note

  1. Most MD5 hash collisions (when they are found) are not useable for forging electronic documents and data. A great deal of computational effort (with larger data sets, but not smaller data sets) is still involved for those wishing to forge MD5 hashes. 
  2. MD5 is still a vast improvement over using CRC-32 or CRC-40. Mathmatical research suggests that MD5 may still be better by many orders of magnitude compared to CRC-64-ECMA, the strongest CRC in common use today. 
  3. A small percentage of all MD5 collisions (in the subset of all collisions) can be used to forge small data sets. However, routine use of forged MD5 data (of larger datasets) is not practical as it is with CRC-32 or some versions of CRC-40.
  4. Most communications, data transfer and authentications protocols are designed (or should be redesigned) to trap MD5 forgeries. 

 

In spite of the relatively good news about the health of MD5, it is set to put into the cryptography technology dustbin.

 

MD5 need not be thrown away just yet. A low complexity fix can keep MD5 in common use for another decade, but in a slightly modified form.

 

 

 

The Fix, Forward + Backward MD5 (MD5BF)

 

 

Key terms & definitions

  1. BOF: Beginning of File (or data stream, data packet, data field etc)
  2. EOF: End of File (or data stream, data packet, data field etc)
  3. MD5(): MD5 in its default run mode (BOF to EOF)
  4. MD5_backward(): MD5 run the opposite direction (EOF to BOF)
  5. [+] : XOR function, bitwise not bytewise.
  6. I will just assume File = ASCII(“1234567890”)
  7. Thus: MD5(File) = e807f1fcf82d132f9bb018ca6738a19f (hex)
  8. A File 'read backward' = ASCII(“0987654321”), but is always indicated as _backward() in the specification syntax used here.
  9. The code of MD5() is not changed at all -- only the bytes are fed to from EOF to BOF.
  10. Thus: MD5_backward(File) = 6fb42da0e32e07b61c9f0251fe627a9c (hex)
  11. I have ignored the issue of Big-Endian and Little-Endian here, as I don’t believe this binary math layer issue affects my technical description at this level

 

Why the backwards option was chosen

Running MD5 backward by itself is not an option. MD5_backward() is mathematically as weak to MD5 in its traditional forward direction.

  1. In some cases it may be advisable that branch versions of this hash function be created that add CRC-32(File) or CRC-64-ECMA(File) to MD5_backward() after the hash function has been run -- in order to be unhelpful to those seeking to find collisions. The exact math is still being worked on here.
  2. Syntactically the CRC addition should be reflected as such: MD5_backward_CRC32() or MD5_backward_CRC64ECMA().
  3. Adding the CRC to the beginning or end of the data stream is also an option, but probably riskier for some uses of MD5 that have evolved over time.

What is not fixed
Rainbow Table Immunity: there is no real increased immunity for MD5BF with respect to Rainbow Table immunity. So long as MD5 is asked to create a hash for a number of bits smaller than the number of bits that MD5 creates as output -- the Rainbow Talble weakenss will continue to be a problem.

What is fixed
MD5BF is much more immune to the append_word_to_hash() --> find_hash_collission() weakness in MD5. MD5BF may indeed push the process of appending to a bytestrem to an existing bytestream to find a hash collission into near computational impossiblity (with current equipment).


Rainbow Table (or similar plaintext attacks) immunity for long bytestreams: MD5BF should be considered immune to these kinds of attacks up to some limit. Counterfitting MD5BF signitures should be nearly impossible under normal circomstances.

 

The New MD5-PSB & MD5-PMD2

 

 

MD5-PSB (Plus 'hash' Sum Backward)

 

Demonstration of MD5-PSB

MD5(File) =

e807f1fcf82d132f9bb018ca6738a19f

hex

[+]

 

 xor

MD5_backward(File) =

6fb42da0e32e07b61c9f0251fe627a9c

hex

Sum =

 

 


MD5-PMD2 (Plus MD2)


Demonstration of MD5-PMD2

MD5(File) =

e807f1fcf82d132f9bb018ca6738a19f

hex

[+]

 

 xor

MD2(File) =

116f4212fae29cf4a77e1be4f53dc551

hex

Sum =

 

 

 


MD5-FB-SHA-Interop (SHA-0 and SHA-1 Bit Field Length Option)

 

 

Technical References

  1. Hash Functions: http://en.wikipedia.org/wiki/Hash_function
  2. Cryptographic Hash Functions: http://en.wikipedia.org/wiki/Cryptographic_hash_function
  3. MD5: http://en.wikipedia.org/wiki/MD5
  4. MD4: http://en.wikipedia.org/wiki/MD4
  5. MD2: http://en.wikipedia.org/wiki/MD2
  6. SHA Hash Functions: http://en.wikipedia.org/wiki/SHA 
  7. SHA & MD Family replacements being tested at NIST: http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo
  8. Checksums (CRCs): http://en.wikipedia.org/wiki/Cyclic_redundancy_check
  9. Paper on CRC collisions: http://apollo.backplane.com/matt/crc64.html



Date Created

16 OCT 2006

 

Last Revised

06 MAR 2007

15 NOV 2008

 

Created By

Max Power