Some words on Password
use, salting and stretching
by Michael Anders
(an@fh-wedel.de,
https://www.fh-wedel.de/~an/)
Fh-Wedel, Feb 8, 2012
(Minor revision, June 17, 2016)
Passwords are secrets to authenticate yourself towards
an instance which may be a server or a program on your computer.
Lets assume the password is used to authenticate yourself as the
legitimate user to a program e.g. "Academic
Signature" or "GnuPG"
which you might use to sign documents or to asymmetrically
encipher files.
Of course any serious program doesn't just store the password somewhere and compares it with the one given on a login or key use attempt. A serious program stores the hash value of the password.The hash algorithm used (e.g.sha2) should be cryptographically secure i.e. it should be infeasible to calculate the preimage from the hash value or any other preimage that hashes to the same value. The serious program asks for the password on a login attempt, calculates the hash value and compares with the corresponding stored hash value. If they don't coincide, the program bounces your login attempt. Of course they will coincide if the proper password is supplied.
Programmers can't help but optimize algorithms for
speed and of course standard state of the art hash algorithms like
sha2 are fast. Sometimes this is not appropriate.
Lets assume your name is Oscar and you want to use the
program and its secrets without being authorized to do so. Lets
furthermore assume you get hold of the user's passwords hash value
(or even the list of hashes in a multi user setting). Some
programs lousily protect this list. On your fast computer you
launch a so called dictionary attack on the hash which is sort of
a refined brute force attack. Dictionary attack means you just try
out all reasonable passwords and check if you can find one that
hashes to the stolen hash value. This will take a long time, till
the sun goes nova right? Well lets see...
Natural language has about 1,5 bits of entropy per
character. Evenly distributed printable characters have 64 bits
per character, but who can memorize passwords like
"§$$%D@#/§fµ|_%?ß\<>_vx7z" ?
I certainly can't and won't. So lets assume I still want to be a
good user and chose the long but memorizable password: "Adam and
eve eat a banana in paradise!". This is 38 characters long,
assuming this is reasonable language we end up at an entropy of
1.5*38=57 bit. Thus there are 2 to the 57 different possible
language passphrases up to that length, which is 1.441151881×10¹⁷.
Wow almost a billion billions, that'll take mighty long to guess,
right? Let's see...
Out of a habit, program developers in their typically
insane minds selected hash algorithms for speed, even in login
applications. Lets assume Oscar has a fast system and can check a
hash in a microsecond(some thousand processor cycles). That comes
down to a time of
1.441151881×10¹⁷ /(1Million*60*60*24*365) = 4570 years for an
exhaustive search. So we are safe?
Well Oscar may be a determined attacker, may have a budget of about 200 000$ to buy 1000 quadcore PCs and we are down to a little more than a year for an exhaustive search or about half a year on average till the password is exposed. You think this is not realistic? If the NSA thinks you might be Osama Bin Laden, they'll get at you with a million processors power easily and crack this password in less than a day.
Even worse, if Oscar recovers a password list with
about 200 entries, chances are, that one of the owners used a
short stupid password like e.g. "penelope" that may be be exposed
in less than a hundredth of a second. Brute Force password
crackers are available online for free!
At least, under the unrealistic assumptions that everyone uses a
38 character password, the search time for the $200 000 budget
Oscar comes down to some hours.
What can we do about this? The solution is salting and stretching.
Chances are, Oscar doesn't even have to launch the dictionary attack at all. There are lists of hash values of dictionary words available online in the form of "Rainbow Tables". Others have already converted many possible reasonable language strings to hash values and the problem is reduced to searching through a database.
You may recognize that the problem is that a lot of
people use the exact same hash algorithm so that the dictionary
attack has to be launched only once by someone who chooses to
share the data.
To protect your password you may individualize
the
hash algorithm such that your password hashes differently
from all other's passwords. Usually you won't create an entirely
new hash algorithm. But you can easily "co-hash" with a salt.
Create a fairly long random number which doesn't have to be kept
secret and append it to your password prior to hashing. This so
called salt must not be treated as a secret. It can be stored
openly in the system you have to authenticate to. This prevents
Oscar from exploiting password lists or precalculated hash
databases and poor Oscar with just a single quadcore processor is
stuck with about 1000 years for cracking "Adam and eve eat a
banana in paradise!" If he got hold of a list of password hashes,
he now has to attack each single one on its own. Now even stupid
"penelope" may pose a small challenge since he has try for 199
good passwords in parallel.
We already mentioned programmers typical urge to pick
fast algorithms. For password hashing this is a stupid drive,
since passwords are typically cracked by a dictionary attack. This
attack would be sped up as well.
For the legitimate user it doesn't make a difference whether the
program needs a microsecond to verify the password or a tenth of a
second. He will not notice the difference. For the attacker it may
be the difference between 1 day for an exhaustive search or the
100 thousand fold time of 300 years.
So we have to artificially and
uncircumventably slow down the hashing calculation. This
is called stretching.
Usually it is recommended to just apply the hash not once but lets
say a million times. There is some problem with reapplying an
operation(like e.g. hashing with sha2) that does not generate a
cyclic group but performs some sort of pseudo-random walk. Let me
not dive too deeply into this here and just state that result
space of the millionfold hash will be somewhat reduced against the
singly applied one.
A better way to do stretching is to employ an algorithm from
asymmetric cryptography combined with hashing the result as one
round. If this is still too fast, this round can be repeated a few
times. Algorithms from the portfolio of RSA, elGamal or elliptic
curves point multiplications are known to be slow and they have
been heavily optimized and scrutinized in the past so that you can
be pretty sure Oscar(or the NSA) don't know about a trick to speed
them up significantly which is not known to the public.
The program "Academic Signature" e.g. uses elliptic curve
calculations for stretching. It is recommendable to adjust
stretching to last about a second for the hash generation. A
second waiting time during login is tolerable for users and puts
severe strain on Oscar. No inhabitant of the earth can type a
keyword in a microsecond and complain about another two
microseconds waiting time.
In the case of the password "Adam and eve eat a banana in
paradise!" such millionfold stretching would even slow down the
hypothetical NSA-Attack with a million processors to about a
thousand years. In effect the millionfold stretching can exactly
cancel the millionfold parallel attack of a high budget
organisation to the power of plain hacker Oscar from next door. In
this case you should expect the resourceful organisation to resort
to other means like e.g. highjacking your OS or "rubberhose
cryptoanalysis" instead ;-).