[cryptography] Intel RNG

coderman coderman at gmail.com
Tue Jun 19 05:30:22 EDT 2012

On Tue, Jun 19, 2012 at 12:48 AM, Marsh Ray <marsh at extendedsubset.com> wrote:
> Right, 500 MB/s of random numbers out to be enough for anybody.

these rates often are not useful. even busy secure web or VPN servers
use orders of magnitude less.

initialization of full disk crypto across an SSD RAID could consume
it, but that's the only "practical" use case i've encountered so far

that said, a typical host entropy gathering daemon is often
insufficient, and even off-die serial bus and other old skewl sources
were providing entropy in kbit/sec rates, not MByte/sec.

> My main point in running the perf numbers was to figure out the
> justification for this RNG not being vulnerable to entropy depletion attacks
> in shared hosting environments.

some (many?) HWRNG designs use bit accumulators that feed a register
that is read by a userspace instruction.

consider the following configuration:
- this register is collecting single bits from two high speed
oscillators that is sub-sampling single bits at a quarter clock.
- only whitened / non-run bits collected, as set in a configuration
write / initialization.

in short: providing slow, non-deterministic output rates to this
entropy instruction.

if the instruction is configured to not-block, you could starve the
available pool in one thread, in one process, by using it
if the instruction is configured to block, you could introduce
significantly processing delays (unexpectedly so?) to other consumers
in other threads of execution, or other processes.

Intel did an end run around this problem with the DRBG, which is
similar to urandom, in the sense that it does not provide a 1:1
correlation of entropy in the system to entropy bits returned out, as
you may get in /dev/random linux behavior, which blocks once the
kernel entropy pool is exhausted. (that's another rant/tangent. let's
not go there :)

> Still, 150 clocks is a crazy long time for an instruction that doesn't
> involve a cache miss or a TLB flush or the like.
> So something is causing AES-NI to take 300 clocks/block to run this DRBG.
> Again, more than 3x slower than the benchmarks I see for the hardware
> primitive. My interpretation is that either RdRand is blocking due to
> "entropy depletion", there's some internal data pipe bottleneck, or maybe
> some of both.

it is also seeding from the physical noise sources, running sanity
checks of some type, and then handing over to DRBG. so there is
clearly more involved than just a call to AES-NI. 3x as expensive
doesn't sound unreasonable if the seeding and validation overhead is

> If in reality there's no way RDRAND can ever fail to return 64 bits of
> random data, then Intel could document that fact and we could save the world
> from yet another untested exceptional code path that only had a moderate
> chance of working the first time it's really needed anyway.

that's a great idea, and my reading of it is that they have said as
much. perhaps they should more clearly state it so for the usual case.
my understanding is that you will never fail unless the hardware
itself "starts up"? with a broken physical source returning clearly
invalid bits.

E.g. "In the rare event that the DRNG fails during runtime, it would
cease to issue random numbers rather than issue poor quality random

as for stating that it should never "run dry", or fail to not return
bits as long as the instruction is working, no matter how frequently
it is invoked, see:

With respect to the RNG taxonomy discussed above, the DRNG follows the
cascade construction RNG model, using a processor resident entropy
source to repeatedly seed a hardware-implemented CSPRNG. Unlike
software approaches, it includes a high-quality entropy source
implementation that can be sampled quickly to repeatedly seed the
CSPRNG with high-quality entropy. Furthermore, it represents a
self-contained hardware module that is isolated from software attacks
on its internal state. The result is a solution that achieves RNG
objectives with considerable robustness: statistical quality
(independence, uniform distribution), highly unpredictable random
number sequences, high performance, and protection against attack.
Protecting online services against RNG attacks [read: high entropy
consumption servers]
Throughput ceiling is insensitive to the number of contending parallel threads

take with a gran of salt; this is all from their documentation:

More information about the cryptography mailing list