[cryptography] Intel RNG

Marsh Ray marsh at extendedsubset.com
Mon Jun 18 19:12:56 EDT 2012

On 06/18/2012 12:20 PM, Jon Callas wrote:
> A company makes a cryptographic widget that is inherently hard to
> test or validate. They hire a respected outside firm to do a review.
> What's wrong with that? I recommend that everyone do that.
> Un-reviewed crypto is a bane.

Let's accept that the review was competent, thorough, and independent.

Here's what I'm left wondering:

How do I know that this circuit that was reviewed is actually the thing
producing the random numbers on my chip? Why should I assume it doesn't
have any backdoors, bugdoors, or "engineering revisions" that make it
different from what was reviewed?

Is RDRAND driven by reprogrammable microcode? If not, how are they going
to address bugs in it? If so, what are algorithms are used by my CPU to
authenticate the microcode updates that can be loaded? What kind of
processes are used to manage the signing keys for it?

Let's take a look at the actual report:

> Page 12: At an 800 MHz clock rate, the RNG can deliver
> post-processed random data at a sustained rate of 800 MBytes/sec. In
> particular, it should not be possible for a malicious process to
> starve another process.

Wait a minute... that second statement doesn't follow from the first.

We're talking about chips with a 25 GB/s *external* memory bus 
bandwidth, why can't a 4-core 2 GHz processor request on the order of 64 
bits per core*clock for 4*64*2e9 = 512e9 b/s = 60 GiB/s ?

So 800 MiB/s @ 800 MHz
= 7.62 clocks per 64 bit RDRAND result (if they mean 1 core)
= 30.5 clocks per 64 bit RDRAND (if they mean 4 cores).
= 61.0 clocks per 64 bit RDRAND (if they mean 8 hyperthread cores).

More info:
> http://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide/
> Data taken from an early engineering sample board with a 3rd generation Intel Core family processor,
> code-named Ivy Bridge, quad core, 4 GB memory, hyper-threading enabled. Software: LINUX* Fedora 14,
> GCC version 4.6.0 (experimental) with RDRAND support, test uses p-threads kernel API.

Why does an Intel "Software Implementation Guide" have more information 
about the actual device under test than the formal report?

> Measured Throughput:
>     Up to 70 million RDRAND invocations per second

     4 cores @ 2 GHz -> 114 clocks per RDRAND
     8 cores @ 2 GHz -> 229 clocks per RDRAND

>     500+ million bytes of random data per second

     rate >= 62.5e6 RDRAND/s
     t(RDRAND) <= 16 ns
               <= 32 clocks, 1 core @ 2 GHz
               <= 256 core*clocks, 8 cores @ 2 GHz

> RDRAND Response Time and Reseeding Frequency
>     ~150 clocks per invocation
>     Note: Varies with CPU clock frequency since constraint is shared data path from DRNG to cores.
>     Little contention until 8 threads  – or 4 threads on 2 core chip
>     Simple linear increase as additional threads are added

So when the statement is given "it should not be possible for a 
malicious process to starve another process" without justification, it 
leads us to ask "why should it not be possible to starve another process?"

Maybe this is our answer: Because this hardware instruction is slow!

150 clocks (Intel's figure) implies 18.75 clocks per byte.

It would appear that the instruction is actually a blocking operation 
that does not return until the request has been satisfied. Or does it?

Can we then expect it will never result in an out-of-entropy condition?

What happens when 16 cores are put on a chip? Will some future chip 
begin occasionally returning zeroes from RDRAND when an attacker fires 
off 31 simultaneous threads requesting entropy? Or will RDRAND take 300 
clocks to execute?

Note that Skein 512 in pure software costs only about 6.25 clocks per 
byte. Three times faster! If RDRAND were entered in the SHA-3 contest, 
it would rank in the bottom third of the remaining contestants.

So perhaps we should not throw away our software-stirred entropy pools 
just yet and if RDRAND is present it should be used to contribute 128 
bits or so at a time as just one of several sources of entropy. It could 
certainly help to kickstart the software RNG in those critical first 
seconds after cold boot (you know, when the SSH keys are being generated).

- Marsh

More information about the cryptography mailing list