[cryptography] Number of hash function preimages

Jon Callas jon at callas.org
Sun Mar 11 04:59:02 EDT 2012

Hash: SHA1

On Mar 10, 2012, at 5:24 PM, Eitan Adler wrote:

> On Sat, Mar 10, 2012 at 7:28 PM, Jon Callas <jon at callas.org> wrote:
>>> 2) Is it known if every (valid) digest has always more than one
>>> preimage? To state it otherwise: Does a collision exist for every
>>> message? (i.e., is the set h^{-1}(x) larger than 1 for every x in the
>>> image of h?).
>> Sure, by the pigeonhole principle. Consider a hash with 256-bit (32-byte) output. For messages larger than 32 bytes, by the pigeonhole principle, there must be collisions. For messages smaller than 32 bytes, they might collide on themselves but don't have to, but there will be some large message that collides with it.
> I think you are misunderstanding the question (or at the very least I
> am). The pigeonhole principle only shows that there exist collisions
> not that collisions exist for every element in the codomain.
> Think about the function over the natural numbers:
> f(x) = {
> 1, if x = 0
> 2, if x > 0
> 3, if x < 0
> }
> While there exist collisions within N it isn't true that every element
> in the co-domain has a collision.

You're right, I misunderstood the question.

Your example gets to what I was saying about a lot being there in the ellipsis on hash functions. 

Let's take your function -- which is a hash function, it's just not generally useful -- and if we define its codomain to be [1..3], then yes. But if its codomain is [0..3] (i.e. it's a two-bit hash function), then no because it never returns a zero. 

Its my intuition that a hash function that's made up of a block cipher and a chaining mode is going to do that when operating on natural sizes. For example, I expect that Skein512-512 is both surjective and well-behaved on the co-domain. It's an ARX block cipher with simple chaining between the blocks. 

However, Skein512-1024 (Skein512 with 1024 bit output) is obviously not surjective (512 bits of state yield 1024 bits of output), but I'd expect the output codomain to be evenly covered. Also note that not only is it a fine output for an RSA-1024 signature, but arguably better than a smaller output.

For smaller output of an unnatural size (e.g. 511 bits), I'd also expect it to cover the codomain, and I think it would have to.

I think you'd have to look at other constructions on a case-by-case basis. If we look again at my trivial modification of a hash function that makes it not return zero but a one instead, it's not surjective, it doesn't evenly cover its codomain, and yet for any practical purpose it's a fine hash function. 

For some purposes, it's even more secure than the original. Consider using it as a KDF for a cipher for which zero is a weak key (like DES). By not returning a weak key, it's more secure than the base function. That's interesting in that a flaw in it being an ideal hash function makes it actually superior as a KDF. 


Version: PGP Universal 3.2.0 (Build 1672)
Charset: us-ascii


More information about the cryptography mailing list