[cryptography] Number of hash function preimages

Jon Callas jon at callas.org
Sat Mar 10 19:28:51 EST 2012

Hash: SHA1

On Mar 9, 2012, at 3:25 AM, Florian Weingarten wrote:

> Hello list,
> first, excuse me if my questions are obvious (or irrelevant).

No, they're interesting and subtle.

> I am interested in the following questions. Let h be a cryptographic
> hash function (let's say SHA1, SHA256, MD5, ...).

There's a lot to put in that ellipsis. 

> 1) Is h known to be surjective (or known to not be surjective)? (i.e.,
> is the set h^{-1}(x) non-empty for every x in the codomain of h?)

No. I would bet that the standard ones are all surjective, but I don't know that it's ever been demonstrated for any given hash function. The main property we want from a hash function is that is is one-way, and demonstrating that a one-way function is or isn't surjective flirts with that, at least. Some will be easy (modulo is one way and surjective, for example), others will be harder. When you add in other desirable hash function properties such as being reasonably collision-free, it becomes harder to show.

However, if you show that a hash function is a combination of surjective functions that all preserve surjectivity, I think it's an easy proof.

All the ones that use a block cipher and a chaining mode are likely easy to prove. 

> 2) Is it known if every (valid) digest has always more than one
> preimage? To state it otherwise: Does a collision exist for every
> message? (i.e., is the set h^{-1}(x) larger than 1 for every x in the
> image of h?).

Sure, by the pigeonhole principle. Consider a hash with 256-bit (32-byte) output. For messages larger than 32 bytes, by the pigeonhole principle, there must be collisions. For messages smaller than 32 bytes, they might collide on themselves but don't have to, but there will be some large message that collides with it.

> 3) Are there any cryptographic protocols or other applications where the
> answers to 1) or 2) are actually relevant?

Very likely not.

Let's construct a trivial non-surjective hash function. Start with H, and construct H' that for any message that produces a hash of 0, we emit 1 instead. It's therefore not surjective since it can't emit a zero. 

It isn't a *useful* non-surjectivity because we don't usefully know a preimage of a zero or a one. 

But now let's construct H'' that emits H(M2) when calculating H(M1). This is just like H', but with different constants. The difference here is that we have artificially created a collision between M1 and M2 instead of a preimage of 0 and a preimage of 1, which we don't know in advance. Is this a useful collision? That's a philosophical question. I'd say no, myself, but I'd understand why someone said yes, I'd merely disagree with them.

That's why I say very likely not, instead of just no.


Version: PGP Universal 3.2.0 (Build 1672)
Charset: us-ascii


More information about the cryptography mailing list