[cryptography] Intel RNG

Kyle Creyts kyle.creyts at gmail.com
Mon Jun 18 19:29:55 EDT 2012


On Mon, Jun 18, 2012 at 7:12 PM, Marsh Ray <marsh at extendedsubset.com> wrote:
> On 06/18/2012 12:20 PM, Jon Callas wrote:
>>
>>
>> A company makes a cryptographic widget that is inherently hard to
>> test or validate. They hire a respected outside firm to do a review.
>> What's wrong with that? I recommend that everyone do that.
>> Un-reviewed crypto is a bane.
>
>
> Let's accept that the review was competent, thorough, and independent.
>
> Here's what I'm left wondering:
>
> How do I know that this circuit that was reviewed is actually the thing
> producing the random numbers on my chip? Why should I assume it doesn't
> have any backdoors, bugdoors, or "engineering revisions" that make it
> different from what was reviewed?
>
> Is RDRAND driven by reprogrammable microcode? If not, how are they going
> to address bugs in it? If so, what are algorithms are used by my CPU to
> authenticate the microcode updates that can be loaded? What kind of
> processes are used to manage the signing keys for it?
>
> Let's take a look at the actual report:
> http://www.cryptography.com/public/pdf/Intel_TRNG_Report_20120312.pdf
>
>> Page 12: At an 800 MHz clock rate, the RNG can deliver
>> post-processed random data at a sustained rate of 800 MBytes/sec. In
>> particular, it should not be possible for a malicious process to
>> starve another process.
>
>
> Wait a minute... that second statement doesn't follow from the first.
>
> We're talking about chips with a 25 GB/s *external* memory bus bandwidth,
> why can't a 4-core 2 GHz processor request on the order of 64 bits per
> core*clock for 4*64*2e9 = 512e9 b/s = 60 GiB/s ?
>
> So 800 MiB/s @ 800 MHz
> = 7.62 clocks per 64 bit RDRAND result (if they mean 1 core)
> = 30.5 clocks per 64 bit RDRAND (if they mean 4 cores).
> = 61.0 clocks per 64 bit RDRAND (if they mean 8 hyperthread cores).
>
> More info:
>>
>>
>> http://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide/
>> Data taken from an early engineering sample board with a 3rd generation
>> Intel Core family processor,
>> code-named Ivy Bridge, quad core, 4 GB memory, hyper-threading enabled.
>> Software: LINUX* Fedora 14,
>> GCC version 4.6.0 (experimental) with RDRAND support, test uses p-threads
>> kernel API.
>
>
> Why does an Intel "Software Implementation Guide" have more information
> about the actual device under test than the formal report?
>
>> Measured Throughput:
>>    Up to 70 million RDRAND invocations per second
>
>
> Implies:
>    4 cores @ 2 GHz -> 114 clocks per RDRAND
>    8 cores @ 2 GHz -> 229 clocks per RDRAND
>
>>    500+ million bytes of random data per second
>
>
> Implies:
>    rate >= 62.5e6 RDRAND/s
>    t(RDRAND) <= 16 ns
>              <= 32 clocks, 1 core @ 2 GHz
>              <= 256 core*clocks, 8 cores @ 2 GHz
>
>> RDRAND Response Time and Reseeding Frequency
>>    ~150 clocks per invocation
>>    Note: Varies with CPU clock frequency since constraint is shared data
>> path from DRNG to cores.
>>    Little contention until 8 threads  – or 4 threads on 2 core chip
>>    Simple linear increase as additional threads are added
>
>
> So when the statement is given "it should not be possible for a malicious
> process to starve another process" without justification, it leads us to ask
> "why should it not be possible to starve another process?"
>
> Maybe this is our answer: Because this hardware instruction is slow!
>
> 150 clocks (Intel's figure) implies 18.75 clocks per byte.

then perhaps this is the proper graph to examine.
http://bench.cr.yp.to/graph-sha3/8-thumb.png
maybe this is the case RDRAND is intended for? producing random 8-byte seeds?

>
> It would appear that the instruction is actually a blocking operation that
> does not return until the request has been satisfied. Or does it?
>
> Can we then expect it will never result in an out-of-entropy condition?
>
> What happens when 16 cores are put on a chip? Will some future chip begin
> occasionally returning zeroes from RDRAND when an attacker fires off 31
> simultaneous threads requesting entropy? Or will RDRAND take 300 clocks to
> execute?
>
> Note that Skein 512 in pure software costs only about 6.25 clocks per byte.
> Three times faster! If RDRAND were entered in the SHA-3 contest, it would
> rank in the bottom third of the remaining contestants.
> http://bench.cr.yp.to/results-sha3.html
>
> So perhaps we should not throw away our software-stirred entropy pools just
> yet and if RDRAND is present it should be used to contribute 128 bits or so
> at a time as just one of several sources of entropy. It could certainly help
> to kickstart the software RNG in those critical first seconds after cold
> boot (you know, when the SSH keys are being generated).
>
> - Marsh
>
> _______________________________________________
> cryptography mailing list
> cryptography at randombit.net
> http://lists.randombit.net/mailman/listinfo/cryptography



-- 
Kyle Creyts

Information Assurance Professional
BSidesDetroit Organizer



More information about the cryptography mailing list