Cryptographic hash functions, or hashes for short, take an input of a variable length and produce a unique, fixed size output. Hashes are used in modern computing in a variety of ways, including password verification, file integrity validation, and file identification. Hash functions are easy to compute based on the input, but are difficult to re-compute given only the output. Because of this property, hashes are great for storing sensitive information, like passwords, in a database.
Basic Example - MD5 Hash
Let’s take a look at a quick example so you can get a basic understanding of how hashes work. My primary language of choice is Ruby, so all of the examples will use Ruby code.
1 2 3 4 5
In this example, the input (referred to as the message), is “password.” This message is run through the MD5 hashing algorithm and returns a string of characters called the digest, in this case “5f4dcc3b5aa765d61d8327deb882cf99”. The digest is what you store in the database so that if your database is compromised, the attacker could not easily obtain the password from the digest alone.
Now, we make a second call to the system checking whether the hash of password matches the stored value of @password. This returns true and would authenticate the user.
Here, we make a call to the system checking whether the hash of “idunno” matches @password. In this case, the program returns false, and would not authenticate the user.
Cracking Hashed Passwords
The MD5 algorithm, as we have implemented it above, is better than simply storing “password” directly in to the database, but is generally not accepted as a secure method of storing passwords. To understand why, let’s take a look at how attackers might try to crack your password hashes.
The simplest way to crack a password hash is to simply try to guess the password. If you guess the password correctly, you will obtain the same hash and gain access to the system. To automate this process, attackers use a variety of tools. Two of the most common tools are called Lookup Tables and Rainbow Tables. Lookup Tables take a file of common words and phrases used in passwords and precompute the hash for each input. Thus, your table would look something like this:
Input Hash password 5f4dcc3b5aa765d61d8327deb882cf99 monkey d0763edaa9d9bd2a9516280e9044d885 etc abcdef0123456789abcdef0123456789
Rainbow tables take a similar approach, however, instead of using a precompiled dictionary they will try to compute all the hashes for a subset of characters. For example, an 8 character password consisting of all numeric characters.
Input Hash 00000000 dd4b21e9ef71e1291183a46b913ae6f2 00000001 ced165163e51e06e01dc44c35fea3eaf etc abcdef0123456789abcdef0123456789
Strengthening our Password Algorithm
By now, you are probably feeling a bit discouraged about storing passwords securely. Luckily, cryptographers have designed a special type of hash function called a key derivation function that make Lookup and Rainbow Tables more difficult for attackers. We are going to talk about three of those hashes. They are called PBKDF2, BCrypt, and SCrypt.
Compared to the basic MD5 hash above, key derivation functions add salts and iterations to the process. A salt is a random string of data that is used as additional input into the function. Each time a new password is stored in the database, a random salt is generated and stored along with the password. This additional field requires the attacker to generate a custom lookup or rainbow table for each individual password! Each of these functions also allow you to configure the number of hashing iterations you go through before finalizing the value. Doing this one time to store or lookup the user’s password doesn’t affect your system much, but if your attacker is trying to generate Lookup or Rainbow Tables for all of the possible solutions, it adds a significant amount of time to their process.
First, let’s take a look at the PBKDF2 key derivation function. Here, we need to set a few variables. Notably, PBKDF2 takes inputs of the password, salt, number of iterations, the digest, and digest length to create the output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Now, let’s pass these variables into the PBKDF2 algorithm.
To make this a little easier, there is a ruby implementation of PBKDF2 that you can use. First install the ruby gem ‘pbkdf2.’
Now, in the ruby shell, require the PBKDF2 library.
1 2 3 4 5
If you are using Ruby 1.9 or newer, you need to add the following code in your console.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
In this implementation, you do not have to create the digest and digest length yourself. It is done behind the scenes.
1 2 3 4 5
To test password success and failure, we need to know both the password and the salt.
1 2 3 4 5
BCrypt is a key derivation function that adds a layer of complexity to the computations from PBKDF2. The bcrypt-ruby library is widely used in Ruby on Rails applications as the default secure password library. It handles the salts for you so that you only have to store one string in your database. Let’s take a look at how it is used in practice.
1 2 3 4 5 6 7 8 9 10
Here, you can see that the string output from creating the password is divided up by the ‘$’. 2a is the version. 10 is the cost. The next 22 characters (“qRVVULjD1mnTCR5n3Wefsu”) is the salt and the remainder (“EaODLX5gG1xl5XLowpcpYj.jBVW3DjC”) is the digest.
So you think you’ve got it solved? Many people believe that using BCrypt is sufficient for storing passwords, however, the designers of SCrypt have upped the ante.
The designers of scrypt estimate that “on modern (2009) hardware, if 5 seconds are spent computing a derived key, the cost of a hardware brute-force attack against scrypt is roughly 4000 times greater than the cost of a similar attack against bcrypt (to find the same password), and 20000 times greater than a similar attack against PBKDF2.
Now, lets load the SCrypt gem and see what it can do. Here, we create a password hash using the SCrypt algorithm and subsequently check whether our string matches the hash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
For comparison, the chart below compares the time to crack hashes generated from a variety of common algorithms:
[Update] Thanks to @gilbes for pointing out that this table is based on 2009 hardware. When you read through this table, please take into consideration that this is based on 5 year old hardware.