MD5 is a widely used cryptographic hash function that was once widely used in computer security. In 2004, MD5 was proven to be unable to prevent collisions, so it is not suitable for security authentication. So what does MD5 collision mean and how do we deal with Hash collisions?
To put it simply, it means that you get the MD5 value of a string, and then based on that value, you back-calculate another string that is different, but has the same MD5 value. This is an MD5 collision, and the chances are very small.
Our common collision method: violent collision (exhaustive method, dictionary method), is to use the computer's resources to try to collide known MD5 code.
1, the exhaustive method
The exhaustive method is to keep trying various combinations of characters to see which combination of MD5 code can be right. The downside is that it's too time-consuming. For example, let's say we want to crack a 6-digit password with a mix of upper and lower case letters and numbers, there are (26 + 26 + 10) ^ 6 combinations for one ****. The size of this number is over 50 billion.
2. Dictionary method
Dictionary method is to store the results of the calculations in the form of a mapping table, an original text corresponds to an MD5 value. The known MD5 code to check the table, you can directly reverse check the original text. Dictionary method reflects the algorithm design of "space for time" idea. The disadvantage is that it is more space-consuming, and in fact, you still have to exhaust all the inputs, but the results of the exhaustion is stored.
A dictionary method to achieve md5 decryption site:/
Usually there are two types of methods to deal with collisions: Open Addressing (Open Addressing) method and linking (Chaining) method. The former is to store all nodes in the hash table T[0..m-1]; the latter is usually to put all elements hashed to the same slot in a chained list, and the head pointer of this chained list is placed in the hash table T[0..m-1].
1. Open addressing method
All the elements are in the hash table, and each table entry either contains an element of the dynamic set, or contains NIL. in this method the hash table may be so full that no new element can be inserted. In the open addressing method, when an element is to be inserted, the items of the hash table can be examined or probed successively until there is an empty slot for the keyword to be inserted. There are three techniques used for open addressing: linear probing, quadratic probing, and double probing.
Scenarios for linear probing
ThreadLocalMap in Java uses open addressing to resolve hash conflicts because open addressing degrades to O(n) time complexity in extreme environments, so it's suitable for scenarios with small amounts of data.
2, chain address method
Chain address method is also known as the chain table method, this method is more common and relatively simple, is to insert an element, if you find that the current position already has an element, the current node as the head node (tail insertion method) or tail node (head insertion method) to construct a chain table, if further optimization, you can modify the chain table into a red-black tree and so on. If further optimized, you can modify the chained table into an array structure such as a red-black tree, for example, HashMap after jdk 1.8 is optimized in this way.
Chain address method applicable scenarios
The hash conflict processing method based on the chain table is more suitable for storing large objects, large amounts of data hash table, because the chain table itself needs additional space to store the pointer address, so if a node to store the data size is not as large as the pointer, it will cause a big waste of space; on the contrary, if the size of the pointer relative to the size of the data can be ignored, then the chain address method can be used to solve the hash problem. On the contrary, if the size of the pointer relative to the size of the data can be ignored, then the chain address method to resolve conflicts will be a better choice, and the chain address method will be more flexible to support more optimization strategies, such as HashMap when the number of nodes of the chain table is greater than 8 will be transformed into a red-black tree to optimize.
This article describes what MD5 collision means, and the methods of Hash collision processing: open addressing method and chain address method. And, it introduces the application scenarios of linear probing method and chain address method. Through the above introduction, you should have a certain understanding of these issues, more about the knowledge of MD5 can be in the previous issue of the article to check oh.