[cryptography] Number of hash function preimages

Marsh Ray marsh at extendedsubset.com
Fri Mar 9 11:17:27 EST 2012


On 03/09/2012 05:25 AM, Florian Weingarten wrote:
> Hello list,
>
> first, excuse me if my questions are obvious (or irrelevant).

I am interested in these questions too.

This is what I pick up from following the SHA-3 list. Someone else 
please jump in if I'm off the mark.

> I am interested in the following questions. Let h be a cryptographic
> hash function (let's say SHA1, SHA256, MD5, ...).
>
> 1) Is h known to be surjective (or known to not be surjective)? (i.e.,
> is the set h^{-1}(x) non-empty for every x in the codomain of h?)

Assuming the domain of x is all messages that are well-defined as input 
(typically all bit strings less than 2^64 bits in length).

It is not known, but it is expected to be statistically very likely 
since h is expected to be a good approximation of a random function 
mapping from >2^(2^64) possible messages down to 2^256 possible hash values.

> 2) Is it known if every (valid) digest has always more than one
> preimage? To state it otherwise: Does a collision exist for every
> message? (i.e., is the set h^{-1}(x) larger than 1 for every x in the
> image of h?).

Again, my impression is that it is believed statistically certain but 
not proven.

> 3) Are there any cryptographic protocols or other applications where the
> answers to 1) or 2) are actually relevant?

I suspect that any proof of (1) or (2) could translate into a weakness 
of the hash function. For (1) hash values should be unrelated (collision 
resistance) and the codomain is supposed to be too large to enumerate 
(brute force resistance). In (2) given only a hash value we're not 
supposed to be able to learn anything definite about the preimages it 
may or may not have (one wayness).

So perhaps the inability provide more than statistical guarantees is a 
necessary part of basic hash function security.

Also, a lot of interesting things happen with current hash functions if 
you restrict the domain of input messages, even in ways that are likely 
to occur in practice (such as to only very short messages, or messages 
with many blocks of fixed text at the end).

- Marsh



More information about the cryptography mailing list