Some words on Password use, salting and stretching

by Michael Anders (an@fh-wedel.de, https://www.fh-wedel.de/~an/)
                                                                                                                                            Fh-Wedel,    Feb 8, 2012
                                                                                                                                       (Minor revision, June 17, 2016)

Password use: How is a password stored?

Passwords are secrets to authenticate yourself towards an instance which may be a server or a program on your computer. Lets assume the password is used to authenticate yourself as the legitimate user to a program e.g. "Academic Signature" or "GnuPG" which you might use to sign documents or to asymmetrically encipher files.

Of course any serious program doesn't just store the password somewhere and compares it with the one given on a login or key use attempt. A serious program stores the hash value of the password.The hash algorithm used (e.g.sha2) should be cryptographically secure i.e. it should be infeasible to calculate the preimage from the hash value or any other preimage that hashes to the same value. The serious program asks for the password on a login attempt, calculates the hash value and compares with the corresponding stored hash value. If they don't coincide, the program bounces your login attempt. Of course they will coincide if the proper password is supplied.

Programmers can't help but optimize algorithms for speed and of course standard state of the art hash algorithms like sha2 are fast. Sometimes this is not appropriate.

Lets assume your name is Oscar and you want to use the program and its secrets without being authorized to do so. Lets furthermore assume you get hold of the user's passwords hash value (or even the list of hashes in a multi user setting). Some programs lousily protect this list. On your fast computer you launch a so called dictionary attack on the hash which is sort of a refined brute force attack. Dictionary attack means you just try out all reasonable passwords and check if you can find one that hashes to the stolen hash value. This will take a long time, till the sun goes nova right? Well lets see...

Natural language has about 1,5 bits of entropy per character. Evenly distributed printable characters have 64 bits per character, but who can memorize passwords like
"§$$%D@#/§fµ|_%?ß\<>_vx7z" ?
I certainly can't and won't. So lets assume I still want to be a good user and chose the long but memorizable password: "Adam and eve eat a banana in paradise!". This is 38 characters long, assuming this is reasonable language we end up at an entropy of 1.5*38=57 bit. Thus there are 2 to the 57 different possible language passphrases up to that length, which is 1.441151881×10¹⁷. Wow almost a billion billions, that'll take mighty long to guess, right? Let's see...

Out of a habit, program developers in their typically insane minds selected hash algorithms for speed, even in login applications. Lets assume Oscar has a fast system and can check a hash in a microsecond(some thousand processor cycles). That comes down to a time of
1.441151881×10¹⁷ /(1Million*60*60*24*365) = 4570 years for an exhaustive search. So we are safe?

Well Oscar may be a determined attacker, may have a budget of about 200 000$ to buy 1000 quadcore PCs and we are down to a little more than a year for an exhaustive search or about half a year on average till the password is exposed. You think this is not realistic? If the NSA thinks you might be Osama Bin Laden, they'll get at you with a million processors power easily and crack this password in less than a day.

Even worse, if Oscar recovers a password list with about 200 entries, chances are, that one of the owners used a short stupid password like e.g. "penelope" that may be be exposed in less than a hundredth of a second. Brute Force password crackers are available online for free!
At least, under the unrealistic assumptions that everyone uses a 38 character password, the search time for the $200 000 budget Oscar comes down to some hours.

What can we do about this? The solution is salting and stretching.

Password use: What is salting, what is a salt?

Chances are, Oscar doesn't even have to launch the dictionary attack at all. There are lists of hash values of dictionary words available online in the form of "Rainbow Tables". Others have already converted many possible reasonable language strings to hash values and the problem is reduced to searching through a database.

You may recognize that the problem is that a lot of people use the exact same hash algorithm so that the dictionary attack has to be launched only once by someone who chooses to share the data.

To protect your password you may individualize the hash algorithm such that your password hashes differently from all other's passwords. Usually you won't create an entirely new hash algorithm. But you can easily "co-hash" with a salt. Create a fairly long random number which doesn't have to be kept secret and append it to your password prior to hashing. This so called salt must not be treated as a secret. It can be stored openly in the system you have to authenticate to. This prevents Oscar from exploiting password lists or precalculated hash databases and poor Oscar with just a single quadcore processor is stuck with about 1000 years for cracking "Adam and eve eat a banana in paradise!" If he got hold of a list of password hashes, he now has to attack each single one on its own. Now even stupid "penelope" may pose a small challenge since he has try for 199 good passwords in parallel.

Password use: What is stretching?

We already mentioned programmers typical urge to pick fast algorithms. For password hashing this is a stupid drive, since passwords are typically cracked by a dictionary attack. This attack would be sped up as well.
For the legitimate user it doesn't make a difference whether the program needs a microsecond to verify the password or a tenth of a second. He will not notice the difference. For the attacker it may be the difference between 1 day for an exhaustive search or the 100 thousand fold time of 300 years.
So we have to artificially and uncircumventably slow down the hashing calculation. This is called stretching.
Usually it is recommended to just apply the hash not once but lets say a million times. There is some problem with reapplying an operation(like e.g. hashing with sha2) that does not generate a cyclic group but performs some sort of pseudo-random walk. Let me not dive too deeply into this here and just state that result space of the millionfold hash will be somewhat reduced against the singly applied one.
A better way to do stretching is to employ an algorithm from asymmetric cryptography combined with hashing the result as one round. If this is still too fast, this round can be repeated a few times. Algorithms from the portfolio of RSA, elGamal or elliptic curves point multiplications are known to be slow and they have been heavily optimized and scrutinized in the past so that you can be pretty sure Oscar(or the NSA) don't know about a trick to speed them up significantly which is not known to the public.
The program "Academic Signature" e.g. uses elliptic curve calculations for stretching. It is recommendable to adjust stretching to last about a second for the hash generation. A second waiting time during login is tolerable for users and puts severe strain on Oscar. No inhabitant of the earth can type a keyword in a microsecond and complain about another two microseconds waiting time.
In the case of the password "Adam and eve eat a banana in paradise!" such millionfold stretching would even slow down the hypothetical NSA-Attack with a million processors to about a thousand years. In effect the millionfold stretching can exactly cancel the millionfold parallel attack of a high budget organisation to the power of plain hacker Oscar from next door. In this case you should expect the resourceful organisation to resort to other means like e.g. highjacking your OS or "rubberhose cryptoanalysis" instead ;-).