[cryptography] Current state of brute-forcing random keys?
marsh at extendedsubset.com
Fri Jun 10 03:03:42 EDT 2011
On 06/09/2011 11:44 PM, Solar Designer wrote:
> On Thu, Jun 09, 2011 at 04:22:16PM -0500, Marsh Ray wrote:
>> The article is behind a paywall, but Google quotes the relevant
>> http://scholar.google.com/scholar?cluster=14788671232027796805 "The
>> standard-cell implementation on a 0.35 ??m CMOS process from
>> Philips Semiconductors occupies an area of only 0.25 mm"
> Here's where you made the error: in your calculations, you assumed
> that the above implementation was fully-pipelined, capable of a
> throughput of one AES block encryption per clock cycle. Or rather, I
> concluded this from your calculations and trusted that this was the
> case. However, now that I actually went to the link above, it says:
> "3400 gate equivalents" "The maximum clock frequency of 80 MHz allows
> a data throughput rate of 9.9 Mbps."
Yeah I wasn't able to read the article due to the paywall. Even the
university site seems to have made the document not available.
> First, 3400 gates, when compared against gate counts for other
> implementations of AES, suggests a size-optimized rather than
> performance-optimized implementation. Probably it excludes key
> setup and is iterated.
I'll buy that.
> Then, 9.9 Mbps is very low, even for 80 MHz. It's like 1000 cycles
> to encrypt one AES block.
> Some more relevant numbers are: 27k gates, 250 MHz, 3 Gbps in a 130
> nm CMOS process:
> Still, 3 Gbps gives something like 23 million AES block encryptions
> per second, which is an order of magnitude slower than one per
> Specifically searching for "pipelined" implementations found this:
> "A High-Speed, Fully-Pipelined VLSI Architecture for Real-Time AES"
> 49.401 Gbps on a Xilinx XC2V6000FF1152-6 (using 17479 slices, or over
> 3 million gates - although presumably lots are wasted, so an ASIC
> implementation would be smaller). This corresponds to a clock rate
> of 386 MHz, and it actually does this many AES block encryptions per
> Using the die area required per logic gate number from the scrypt
> paper, given as 5 um2 for 130 nm process, 3 million gates is 15 mm2.
I'm suspicious of this conversion of Xilinx slices to gate count to
130nm process mm2. I remember being surprised at the inaccuracy when
I've tried to do that in the past.
> Perhaps you could in fact do 1 GHz in ASIC, and perhaps in fewer
> than 3 million gates, but definitely not in 3400 gates. So there's
> something like a factor of 20 to 100 difference here. Which turns
> $42 into $1k to $4k, and that in electricity costs alone.
I'm happy to get within an order or two of magnitude. That's under 7
bits of key length.
> This sounds much closer to the reality to me - e.g., it agrees with
> electricity costs for running near-optimal code on GPUs. I saw no
> reason why GPUs running near-optimal code (getting very close to
> their theoretical peak performance) would consume so much more power
> than ASICs. So here's the explanation. They don't.
I think your estimate is more realistic too.
My estimate was intentionally conservative, erring on the side of an
attacker who was a world-class chip designer with a state of the art
foundry at his disposal. Which is a plausible scenario.
> The 27k gates implementation could be 0.135 mm2, but it only does 23
> million AES blocks per second, or 170 million per mm2-second. That's
> 25 times slower than your estimate.
> 130 nm is not state of the art. But on the other hand, we already
> had CPUs running at clock rates beyond 1 GHz at this tech process,
It's certainly possible to make a chip that cranks out at 2^32 trials
per second and I don't think you need to pay an insane gate penalty to
do it. These algorithms are designed for efficient hardware
implementation after all and they're hardly as demanding as, say, a
If you're expecting to run a datacenter filled with them for years, it's
primarily a question of how much power it consumes per trial. How many
one can pack onto each die is a secondary consideration. I really think
this type of silicon produced in any significant volume is going to cost
the attacker less than the energy needed to power and cool it
continuously for any significant length of time.
> whereas this AES implementation only does 250 MHz - so perhaps that's
> how it is designed (maximizing throughput rather than clock rate).
> Based on this incomplete data, my guesstimate is that it would not
> reach 1 GHz under modern tech process. Still, that would bring us
> closer to your original estimate... maybe just 10x slower (for 600
> Another factor is that exhaustive key searches (over a certain easy
> to define range) may use a slightly incomplete implementation most of
> the time (to reject most keys). For DES, it was like 13 out of 16
> rounds. Something similar will likely work for AES (I never looked
> into that). But that's relatively minor.
It's minor, but it's also an optimization available to the attacker
that's probably not available to the defender in normal operation. Add
up these minor advantages for the attacker here and there and it amounts
to whole bits removed from the effective key strength.
> On 06/10/2011 12:11 AM,
> Solar Designer wrote: On Fri, Jun 10, 2011 at 08:44:39AM +0400, Solar
> Designer wrote:
> Here's a smaller implementation:
> "90nm implementation of AES-ECB with 256-bit key" "13 000 gates",
> "2742 Mb/s" "Key-expander included"
> That's 21 million AES blocks per second in 13k gates.
> The clock rate is not mentioned, though. Perhaps it's already
> higher than 250 MHz, maybe twice higher.
> "The AES IP has a strong track record of silicon implementation with
> volume production in 130nm and 65nm."
> Hmm, volume production in 130nm and 65nm, but somehow the speed
> numbers are given for 90nm - weird.
Perhaps the ones they benchmarked are not the same as the ones that
their licensees produced in volume?
More information about the cryptography