[cryptography] philosophical question about strengths and attacks at impossible levels

Marsh Ray marsh at extendedsubset.com
Fri Oct 15 04:33:03 EDT 2010

On 10/14/2010 10:09 PM, Zooko O'Whielacronx wrote:
> This part is a technical mistake, see
> http://cr.yp.to/hash/collisioncost-20090517.pdf .

> I wouldn't be
> surprised if you are right that this is how NIST came up with the
> requirement for a 512-bit hash function though.

NIST may not really need a reason to standardize a 512 bit function. If 
somebody somewhere is going to use 512 bit hashes, regardless of whether 
or not this size is justified, it's probably better for everyone's sake 
that they're at least standardized.

Apparently SHA-2-384 is what they use for TOP SECRET material:
Of course, they don't say what they use for the even more secret stuff.

SHA-2-512 is derived from SHA-2-256 mostly by doubling the internal 
values from 32 to 64 bits, and SHA-2-384 is a truncation of that. So 
even if the actual need were only for a 384-bit hash, it would have been 
harder _not_ to end up with a SHA-2-512 as a side effect.

Obviously, SHA-3 can't go without a 512 bit version if the older SHA-2 
has one.

This is maybe a simpler explanation for why we have 512 bit hash 
functions. But admittedly, it's not as interesting as the idea of a 
captured space alien machine out at Area 51 generating exhaustive 
rainbow tables for SHA-2-256.

> I think if you want a hash function to be much less likely to fail at
> pre-image resistance (without being much more likely to fail at
> collision-resistance)

Do you know a way to improve one without improving the other?

> then you need to increase the amount of
> computation performed (while keeping the state size the same or
> larger).

There's an old saying among engineers: "Any engineer can make a bridge 
that stands. It takes a great engineer to make a bridge that stands just 
barely." I.e., using a minimum cost of materials.

So let's turn the question around: say you're an engineer designing a 
hash function suitable for use in applications such as RFID tags. Every 
picowatt-microsecond of power you consume is going to take away from the 
RF transponder power and limit the data rate. Needless to say, you have 
a limited budget for your state size and the number of computations you 
can perform.

So how do you design a hash function that provides the best security 
given limited resources? What's the most efficient conversion from N 
bits of state and M CPU cycles to security? Shouldn't there be something 
in the SHA-3 family for this engineer?

I don't think SHA-3 is being developed because of a real weakness in 
SHA-2. It's probably because we can do so much better in terms of 
overall efficiency that environments for which SHA-2 is way too 
expensive will be able to move off of MD5, MD4, or whatever hacked-up 
LFSR they're using now.

> For example, you could take SHA1 and double the number of
> rounds. This would take twice as much CPU and it would be *much* less
> likely to fail at pre-image resistance than SHA1.

It really depends.

The rounds of SHA-1 are not all the same like they are in SHA-2, so 
there are various ways you could go about doubling them. Given that 
there are significant attacks against them already, are you sure you'd 
want to spend your resources on just more of the same?

What if an attack focused only the first or the last rounds, or both of 
them, but independently? You'd have the same calculations in the same 

So in any case, instead of the current SHA-1, a 160-bit output length 
hash function that has attacks against it that are worrying but not 
quite practical, you would end up with a 160-bit output length hash 
function that has half the throughput, cannot take advantage of any 
off-the-shelf hardware acceleration, is not approved for use in any 
setting which approves such things, is incompatible with every existing 
standard, and still has attacks against it but they are probably more 

> (Whether it would be
> *more* likely to fail at collision resistance I'm not sure.)

The rounds form a basic block cipher, so there's no information loss in 
that part so doubling the rounds shouldn't increase collisions. But 
there may be other attacks that become available.

But if you decided to go about it by simply repeating every 64 bytes of 
message data as it was fed into the input side (perhaps you had to use a 
library function and couldn't modify it), then that would actually 
degrade the entropy present in the output a little.

> Likewise I think if you are willing to throw more state at it
> (double-width pipe, big fat sponge) and throw a proportionally greater
> amount of CPU at it, you are can make any reasonable hash function
> much less likely to fail at collision-resistance.

Not long ago I would have unequivocally agreed with you. But I got 
interested, and started reading about it and now believe the reality is 
more complicated.

For example, the paper describing the Skein function describes the 
process used for selecting an optimal sequence of rotation constants for 
each round. It was sort of a heuristic algorithm run separately for the 
different internal state sizes.

If you tried to widen a function without a detailed knowledge of the 
decisions that went into its design, you'd likely make some suboptimal 
choices. Look at the SHA-2-256 vs SHA-2-512 rotation constants, how did 
they decide those changes? Why did they increase the rounds to 80 and 
not 96? There's also the possibility that the original designer just got 
a bit lucky, so even if you followed the same principles when extending 
it, you'd be rolling the dice again.

SHA-3 had 51 entrants, when a winner is selected, if you widened it, how 
would your modified version fare in a new SHA-4 competition specifically 
for costlier designs?

> So I do think there
> is a (fuzzy) trade-off between computational efficiency and safety
> here.

Several have tunable parameters where the safety-cost trade-off can be 
decided late in the process.

I think the most relevant definition of efficiency is something like:

	safe, effective security
	computational cost (cycles, die area, power, etc.).

If we get a function that optimized according to that definition then we 
can use it as a building block to satisfy any other need.

What I would like to see next is approved ways of combining hash 
functions such that they provide redundant safety and allow us to apply 
more CPU power like you were saying. A lot like the standardized HMAC 
and block cipher modes.

A standardized algebraic notation could allow us to interchange data.
For example:
     MD5 + SHA-1                  = the combination of MD5 and SHA-1
     (SHA-2-512 + SHA-3-512)/256  = combo with reduced 256 bit output
     SHA-1*2                      = SHA-1 applied twice

The scheme might define a partial or total ordering based on security.

Perhaps the values assigned to hash primitives could be reconfigured 
over time to reflect the discovery of new attacks. A protocol such as 
TLS could negotiate the best combination of security and efficiency by 
taking into consideration the greatly varying capabilities of each 
endpoint (e.g., some endpoints have hardware acceleration for some 

>This implies that the fastest SHA-3 candidates are the least
> safe. ;-)

I'm sure we'll get one that's very fast and unquestionably safe.

- Marsh

More information about the cryptography mailing list