# [cryptography] Digest comparison algorithm

Marsh Ray marsh at extendedsubset.com
Fri Dec 2 02:21:20 EST 2011

```On 12/02/2011 12:25 AM, Solar Designer wrote:
> On Thu, Dec 01, 2011 at 11:16:14PM -0600, Marsh Ray wrote:
>>
>> 1. The largest cluster will represent the case where H[0] fails the
>> comparison in strcmp().
>>
>> 2. The second cluster will be on the order of a few machine cycles
>> longer,  representing times that H[0] compared successfully.
>
> Yes.
>
>> This cluster will be approximately 256 times smaller than the first.
>
> Why not 16 times, if we use hex digits and assume a char-by-char strcmp()?

You're right, I mentally shifted to using bytes instead of hex digits there.

>> With
>> 4096 trials the expectation is that this cluster will contain about 16
>> members.
>
> 256 with the above correction.
>
>> Now that he has a fuzzy idea of which passwords succeed in matching
>> H[0], he evaluates this set for all 4096 possible salt values. There
>> will be only one salt value that produces the same H[0] for all of these
>
> Did you mean there will _likely_ (but not necessarily) be only one such
> salt value?

Very likely I think, even for just a handful of point in cluster 2.

I don't have the exact probability formula for N-way collisions on the
top of my head right now (or ever really), but maybe we can intuit
around it in spite of all the paradoxes that apply to collision.

Out of a set of 4096 (salt values) random functions each mapping

{ 1...256 } -> { 0 ... 255 }
samples        H[0] values

how many would we expect to have all samples map to the same value,
i.e., have a codomain size of 1 ?

I'm thinking the probability of any one salt value hitting this pattern
by chance alone is something like

1 / 256^(256 - 2)

Which seems rather small, even times 4096 potential values, even if
there are some birthday phenomena involved.

Perhaps there are some bored mathematicians on the list who wouldn't
mind explaining how many samples we'd need (at what timing confidence)
in order to get a reasonable confidence in the result.

It would also appear that initial set of trial passwords could be
precomputed along with a graph to direct the search. E.g., Passwords 37
and 128 look like strong H[0] matches, eliminate 255/256 of the
candidate salts and proceed with a set of trial passwords that most
rapidly distinguish between those remaining.

>> So if his timing data is any good, he has learned the salt
>
> Yes, well done.
>
>> Conclusion: Salts placed at the beginning of the password string must
>> contain sufficient entropy to resist offline brute-force in order to
>> provide mitigation against timing attacks. It may be better to place
>> them at the end of the password hash string.
>
> I don't see how placement of the salt in the encoded salt+hash string
> matters.  With either placement, the salt characters in the string will
> always match because crypt(3) is called with that stored salt.  The fact
> that with salt placement at the beginning strcmp() actually does compare
> those characters before it gets to comparing H[0] doesn't affect
> anything, as far as I can see, assuming a char-by-char strcmp().

Yes, you're right. I remember I went back and forth about in my mind the