[cryptography] Current state of brute-forcing random keys?
solar at openwall.com
Fri Jun 10 00:44:39 EDT 2011
On Thu, Jun 09, 2011 at 04:22:16PM -0500, Marsh Ray wrote:
> The article is behind a paywall, but Google quotes the relevant statistic:
> "The standard-cell implementation on a 0.35 ??m CMOS process from Philips
> Semiconductors occupies an area of only 0.25 mm"
Here's where you made the error: in your calculations, you assumed that
the above implementation was fully-pipelined, capable of a throughput of
one AES block encryption per clock cycle. Or rather, I concluded this
from your calculations and trusted that this was the case. However, now
that I actually went to the link above, it says:
"3400 gate equivalents"
"The maximum clock frequency of 80 MHz allows a data throughput rate of
First, 3400 gates, when compared against gate counts for other
implementations of AES, suggests a size-optimized rather than
performance-optimized implementation. Probably it excludes key setup
and is iterated.
Then, 9.9 Mbps is very low, even for 80 MHz. It's like 1000 cycles to
encrypt one AES block.
Some more relevant numbers are: 27k gates, 250 MHz, 3 Gbps in a 130 nm
Still, 3 Gbps gives something like 23 million AES block encryptions per
second, which is an order of magnitude slower than one per cycle.
Specifically searching for "pipelined" implementations found this:
"A High-Speed, Fully-Pipelined VLSI Architecture for Real-Time AES"
49.401 Gbps on a Xilinx XC2V6000FF1152-6 (using 17479 slices, or over 3
million gates - although presumably lots are wasted, so an ASIC
implementation would be smaller). This corresponds to a clock rate of
386 MHz, and it actually does this many AES block encryptions per second.
Using the die area required per logic gate number from the scrypt paper,
given as 5 um2 for 130 nm process, 3 million gates is 15 mm2.
Perhaps you could in fact do 1 GHz in ASIC, and perhaps in fewer than
3 million gates, but definitely not in 3400 gates. So there's something
like a factor of 20 to 100 difference here. Which turns $42 into $1k to
$4k, and that in electricity costs alone.
This sounds much closer to the reality to me - e.g., it agrees with
electricity costs for running near-optimal code on GPUs. I saw no
reason why GPUs running near-optimal code (getting very close to their
theoretical peak performance) would consume so much more power than
ASICs. So here's the explanation. They don't.
The 27k gates implementation could be 0.135 mm2, but it only does 23
million AES blocks per second, or 170 million per mm2-second. That's
25 times slower than your estimate.
130 nm is not state of the art. But on the other hand, we already had
CPUs running at clock rates beyond 1 GHz at this tech process, whereas
this AES implementation only does 250 MHz - so perhaps that's how it is
designed (maximizing throughput rather than clock rate). Based on this
incomplete data, my guesstimate is that it would not reach 1 GHz under
modern tech process. Still, that would bring us closer to your original
estimate... maybe just 10x slower (for 600 MHz).
Another factor is that exhaustive key searches (over a certain easy to
define range) may use a slightly incomplete implementation most of the
time (to reject most keys). For DES, it was like 13 out of 16 rounds.
Something similar will likely work for AES (I never looked into that).
But that's relatively minor.
More information about the cryptography