# [cryptography] Number of hash function preimages

Marsh Ray marsh at extendedsubset.com
Fri Mar 9 11:17:27 EST 2012

```On 03/09/2012 05:25 AM, Florian Weingarten wrote:
> Hello list,
>
> first, excuse me if my questions are obvious (or irrelevant).

I am interested in these questions too.

This is what I pick up from following the SHA-3 list. Someone else
please jump in if I'm off the mark.

> I am interested in the following questions. Let h be a cryptographic
> hash function (let's say SHA1, SHA256, MD5, ...).
>
> 1) Is h known to be surjective (or known to not be surjective)? (i.e.,
> is the set h^{-1}(x) non-empty for every x in the codomain of h?)

Assuming the domain of x is all messages that are well-defined as input
(typically all bit strings less than 2^64 bits in length).

It is not known, but it is expected to be statistically very likely
since h is expected to be a good approximation of a random function
mapping from >2^(2^64) possible messages down to 2^256 possible hash values.

> 2) Is it known if every (valid) digest has always more than one
> preimage? To state it otherwise: Does a collision exist for every
> message? (i.e., is the set h^{-1}(x) larger than 1 for every x in the
> image of h?).

Again, my impression is that it is believed statistically certain but
not proven.

> 3) Are there any cryptographic protocols or other applications where the
> answers to 1) or 2) are actually relevant?

I suspect that any proof of (1) or (2) could translate into a weakness
of the hash function. For (1) hash values should be unrelated (collision
resistance) and the codomain is supposed to be too large to enumerate
(brute force resistance). In (2) given only a hash value we're not
supposed to be able to learn anything definite about the preimages it
may or may not have (one wayness).

So perhaps the inability provide more than statistical guarantees is a
necessary part of basic hash function security.

Also, a lot of interesting things happen with current hash functions if
you restrict the domain of input messages, even in ways that are likely
to occur in practice (such as to only very short messages, or messages
with many blocks of fixed text at the end).

- Marsh

```