[cryptography] Intel RNG

Marsh Ray marsh at extendedsubset.com
Tue Jun 19 03:48:16 EDT 2012

On 06/19/2012 01:36 AM, Jon Callas wrote:
> On Jun 18, 2012, at 4:12 PM, Marsh Ray wrote:
>> 150 clocks (Intel's figure) implies 18.75 clocks per byte.
> That's not bad at all.

Right, 500 MB/s of random numbers out to be enough for anybody.

My main point in running the perf numbers was to figure out the 
justification for this RNG not being vulnerable to entropy depletion 
attacks in shared hosting environments.

Still, 150 clocks is a crazy long time for an instruction that doesn't 
involve a cache miss or a TLB flush or the like.

> It's in the neighborhood of what I remember my
> DRBG running at with AES-NI. Faster, but not by a lot. However, I
> will getting the full 16 bytes out of the AES operation and RDRAND is
> doing 64 bits at a time, right?

~150 clocks gets you 64 bits, up to some internal bandwidth limit hit at 
around 8 threads.

>> Note that Skein 512 in pure software costs only about 6.25 clocks
>> per byte. Three times faster! If RDRAND were entered in the SHA-3
>> contest, it would rank in the bottom third of the remaining
>> contestants. http://bench.cr.yp.to/results-sha3.html
> As much as it warms my heart to hear you say that, it's not a fair
> comparison.

It's an apples/oranges comparison in some ways - but in some it's not.

For example, skein1.1.pdf says "Skein can be used as a PRNG with the 
same security properties as the SP 800-90 PRNGs" and that "it can 
produce random data at the same speed that it hashes data."

On the other hand, Skein is pure software, so the hardware has little 

> A DRBG has to do a lot of other stuff, too. The DRBG is
> an interesting beast and a subject of a whole differentb
> conversation.

So the CR report says "The DRBG is based on AES in counter mode, per the 
NIST SP 800-90A", flipping to this nice overview of

NIST CTR mode DRBGs, it says (p 46) "One encryption per blocksize bit 

So something is causing AES-NI to take 300 clocks/block to run this 
DRBG. Again, more than 3x slower than the benchmarks I see for the 
hardware primitive. My interpretation is that either RdRand is blocking 
due to "entropy depletion", there's some internal data pipe bottleneck, 
or maybe some of both.

If in reality there's no way RDRAND can ever fail to return 64 bits of 
random data, then Intel could document that fact and we could save the 
world from yet another untested exceptional code path that only had a 
moderate chance of working the first time it's really needed anyway.

- Marsh

More information about the cryptography mailing list