Tuesday, April 18, 2006

Storing passwords in database

This is one of the most common and trivial challenges that developers face in any enterprise application development – How to store passwords in database in a safe and secure manner?
It’s common sense to reconcile that passwords should not be stored as plain text as it would be possible for anyone to hack the passwords if he has access to the database table.

One option is to encrypt the passwords using a symmetric key and decrypting it for comparison during password authentication. But then we have the problem of securely storing the secret symmetric key, because if the key were compromised all the passwords would become accessible.

The most common and simple solution to this problem is to ‘hash’ the passwords before storing it in database. MD5 or SHA-1 hash algorithms can be used to perform a one-way hash of the password. To compare passwords for authentication, you retrieve the password entered by the user and hash it; then you compare the hashed value with the hashed value in the database. This method is quite foolproof and safe as it’s impossible to convert a hashed entry into its original value.
But still it possible for a hacker to perform a brute-force dictionary attack on the passwords and guess some passwords. More info on this here.

To hinder the risk of a dictionary attack, it is important that password contain special characters that are difficult to guess. Another option is to ‘salt’ the password before hashing it. Salting the password means adding some padding data in front or back of the password to create a new string that is hashed and put inside the database table. This padding data could be a random generated number or the userID of the user itself. This makes the dictionary attack much difficult to succeed.

Recently there was a lot of furor over the security of MD5 algorithm. The problem is that of MD5 hash collisions. The problem that arises is the following:

Since what we did is take the characters in some text, however many they are, and producing 128 characters out of them somehow, there will be lots of texts that give the same set of 128 characters, and hence have the same MD5 value. I.e. the hash-function is not 1-1 as we say. So how do we know that the file we received is not one of those other millions of files that have the same MD5 value? The simple answer is, we don’t. But what we believe is that the chance that this other file will be meaningful is miniscule. In other words, we believe that even if someone were to tamper with our file on its way to us, they would not be able to produce a file that has the same MD5 value and can harm u.
But the site below shows how two different postscript files end up having the same MD5 hash. http://www.cits.rub.de/MD5Collisions/

So, what are the other options? Tiger and SHA-2 are still considered to be safe hash functions to use. One could possibly also apply a number of hash-functions to the same file, as secondary checks.