Your Cart

Understanding the Birthday Paradox - Cryptography

We learned about the one-way hash function in the previous blog post. We understood that a strong hashing algorithm does not produce the same hash value for two different messages. If the algorithm does produce the same value for two distinctly different messages, this is called a collision. An attacker can attempt to force a collision, which is referred to as a birthday attack. This attack is based on the mathematical birthday paradox that exists in standard statistics.


It’s the game of probability that one needs to play to appreciate this paradox. Well, let’s assume that you go to a party and the host asks if anyone shares the same birthday as him. Before you start running to get yourself a calculator and start making mathematical assumptions, you need to ask the following two questions:


How many people must be in the same room for the chance to be greater than even if another person has the same birthday as you?      It’s 253


How many people must be in the same room for the chance to be greater even if at least two people share the same birthday?         It’s 23


The first question is very specific and asks you to find the number of people that must be in the same room to share the same birthday as YOU, while the second question focuses to find any two people to share the same birthday. The important aspect to absorb from this head itching discussion is that there is a higher chance of finding two people who share the same birthday rather than a person who shares the same birth date as yours. If you are still confused, consider it in this way. You have a box full of random numbers. I ask you to find a specific number say 3, vs finding any two-digit number. The chances are quite greater in the second case.


I would suggest you read this article if you want to understand the mathematics behind this - https://betterexplained.com/articles/understanding-the-birthday-paradox/

(To be honest, I’m also struggling with this mathematics)


Back to the problem at hand, there need to be just 23 people in the room for a 50% chance for them to share their birthdays. Hence, any random set of 23 people includes 2 people who may share a birthday (at least a 50 percent chance). A very logical question at this stage is - Why do I do then or Who gives a F*%&?


Let’s understand that now. If a hashing algorithm generates a message digest of 30 bits, there is a high likelihood that an adversary can find a collision using only 2^15 inputs. The main way an attacker can find the corresponding hashing value that matches a specific message is through a brute-force attack. If he finds a message with a specific hash value, it is equivalent to finding someone with a specific birthday. If he finds two

messages with the same hash values, it is equivalent to finding two people with the same birthday.


The output of a hashing algorithm is n, and to find a message through a brute force attack that results in a specific hash value would require hashing 2^n random messages. To take this one step further, finding two messages that hash to the same value would require a review of only 2^ (n/2) messages.


If you are thinking that this is going nowhere and are not interested in giving any more F about this, let me assure you that things are going to turn quite interesting now ( like the Shawshank redemption’s end, ok, I may have exaggerated a teeny tiny bit).


How would a Birthday Attack Happen?


The classic Gone Girl situation here now - Amy and Nick have been married for over 5 years now. Amy hates Nick likes anything. At the time of the wedding, they signed a contract that stated that in the case of a divorce, Amy would not give a penny from her trust fund to Nick. To ensure this contract was not modified, it was hashed and a message digest value was created.

Now, Nick was a good guy, but what can you say, the marriage took a toll on Nick and he wanted the full trust fund to himself now. Nick took the contract to the lawyer and together they started changing the contract,( but you know any change in the message would affect the hash value).

Here comes the birthday paradox. Nick wants a collision here. He wants to find a message which would generate the same hash value as the original message. As stated earlier, the main way an attacker can find the corresponding hashing value that matches a specific message is through a brute-force attack. If he finds a message with a specific hash value, it is equivalent to finding someone with a specific birthday. If he finds two messages with the same hash values, it is equivalent to finding two people with the same birthday.


Nick and his lawyer continue to tweak the contract until they find the same hash value as the original message. This becomes super easy for them because there was a big gaping hole at that time of creating the message digest. The message digest that was created was only 10 bits long. Just 1024 brute force requirements to find the collision (The output of a hashing algorithm is n, and to find a message through a brute force attack that results in a specific hash value would require hashing 2^n random messages).


The main point of discussing this paradox is to show how important longer hashing values truly are. A hashing algorithm that has a larger bit output is less vulnerable to brute force attacks such as a birthday attack. The new algorithms today produce a much higher bit value - SHA 384, SHA 512, etc. With advances in quantum computing, even these hash values would not be enough.


What are your thoughts on this paradox? Pour out your mind in the comments section below...